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D scription 

BACKGROUND OF THE INVENTION 

5 [0001] The present invention relates to genetic engineering and more particularly to plant transformation in which a 
plant is transformed to express a heterologous gene. 

[0002] Although great progress has been made in recent years with respect to transgenic plants which xpress 
foreign proteins such as herbicide resistant enzymes and viral coat proteins, very little is known about the major factors 
affecting expression of foreign genes in plants. Several potential factors could be responsible in varying degrees for 
10 the level of protein expression from a particular coding sequence. The level of a particular mRNA in the cell is certainly 
a critical factor. 

[0003] The potential causes of low steady state levels of mRNA due to the nature of the coding sequence are many. 
First, full length RNA synthesis might not occur at a high frequency. This could, for example, be caused by the premature 
termination of RNA during transcription or due to unexpected mRNA processing during transcription. Second, full length 

is RNA could be produced but then processed (splicing, poIyA addition) in the nucleus in a fashion that creates a non- 
functional mRNA. If the RNA is properly synthesized, terminated and polyadenyiated, it then can move to the cytoplasm 
for translation. In the cytoplasm, mRNAs have distinct half lives that are determined by their sequences and by the cell 
type in which they are expressed. Some RNAs are very short-lived and some are much more long-lived. In addtion, 
there is an effect, whose magnitude is uncertain, of translational efficiency on mRNA half-life. In addition, every RNA 

20 molecule folds into a particular structure, or perhaps family of sturctures, which is determined by its sequence. The 
particular structure of any RNA might lead to greater or lesser stability in the cytoplasm. Structure per se is probably 
also a determinant of mRNA processing in the nucleus. Unfortunately, it is impossible to predict, and nearly impossible 
to determine, the structure of any RNA (except for tRN A) in vitro or in vivo. However, it is likely that dramatically changing 
the sequence of an RNA will have a large effect on its folded structure. It is likely that structure per se or particular 

25 structural features also have a role in determining RNA stability. 

[0004] Some particular sequences and signals have been identified in RNAs that have the potential for having a 
specific effect on RNA stability. This section summarizes what is known about these sequences and signals. These 
identified sequences often are A+T rich, and thus are more likely to occur in an A+T rich coding sequence such as a 
a f. gene. The sequence motif ATTTA (or AUUUA as it appears in RNA) has been implicated as a destabilizing sequence 

30 in mammalian cell mRNA (Shaw and Kamen, 1986). No analysis of the function of this sequence in plants has been 
done. Many short lived mRNAs have A+T rich 3' untranslated regions, and these regions often have the ATTTA se- 
quence, sometimes present in mutiple copies or as multimers (e.g., ATTTATTTA...). Shaw and Kamen showed that 
the transfer of the 3* end of an unstable mRNA to a stable RNA (globin or VA1 ) decreased the stable RNA's half life 
dramatically. They further showed that a pentamer of ATTTA had a profound destabilizing effect on a stable message, 

35 and that this signal could exert its effect whether it was located at the 3' end or within the coding sequence. However, 
the number of ATTTA sequences and/or the sequence context in which they occur also appear to be important in 
determining whether they function as destabilizing sequences. Shaw and Kamen showed that a trimer of ATTTA had 
much less effect than a pentamer on mRNA stability and a dimer or a monomer had no effect on stability (Shaw and 
Kamen, 1987). Note that multimers of ATTTA such as a pentamer automatically create an A+T rich region. This was 

40 shown to be a cytoplasmic effect, not nuclear. In other unstable mRNAs, the ATTTA sequence may be present in only 
a single copy, but it is often contained in an A+T rich region. From the animal cell data collected to date, it appears 
that ATTTA at least in some contexts is important in stability, but it is not yet possible to predict which occurences of 
ATTTA are destabiling elements or whether any of these effects are likely to be seen in plants. 
[0005] Some studies on mRNA degradation in animal cells also indicate that RNA degradation may begin in some 

4S cases with nucleolytic attack in A+T rich regions. It is not clear if these cleavages occur at ATTTA sequences. There 
are also examples of mRNAs that have differential stability depending on the cell type in which they are expressed or 
on the stage within the cell cycle at which they are expressed. For example, histone mRNAs are stable during DNA 
synthesis but unstable if DNA synthesis is disrupted. The 3* end of some histone mRNAs seems to be responsible for 
this effect (Pandey and Marzluff , 1 987). It does not appear to be mediated by ATTTA, nor is it clear what controls the 

so differential stability of this mRNA. Another example is the differential stability of IgG mRNA in B lymphocytes during B 
cell maturation (Genovese and Milcarek, 1988). A final example is the instability of a mutant beta-thallesemic globin 
mRNA. In bone marrow cells, where this gene is normally expressed, the mutant mRNA is unstable, while the wild- 
type mRNA is stable. When the mutant gene is expressed in HeLa or L cells in vitro, the mutant mRNA shows no 
instability (Lim et al., 1 988). These examples all provide evidence that mRNA stability can be mediated by cell type or 

55 cell cycle specific factors. Furthermore this type of instability is not yet associated with specific sequences. Given these 
uncertainties, it is not possibl to pr diet which RNAs are likely to b unstable in a given cell. In addition, ven the 
ATTTA motif may act differentially d p nding on the nature of the cell in which the RNA is pres nt Shaw and Kamen 
(1987) have reported that activation of protein kinase C can block degradation mediated by ATTTA. 
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[0006] The addition of a polyadenylate string to th 3* end is common to most eucaryotic mRNAs, both plant and 
animal. The currently accepted view of polyA addition is that the nascent transcript extends beyond th mature 3' 
terminus. Contained within this transcript are signals for polyadenylation and proper 3' end formation. This processing 
at the 3' end involves cleavage of the mRNA and addition of polyA to the mature 3' end. By searching for consensus 

5 sequences near the polyA tract in both plant and animal mRNAs, it has been possibl to identify consensus sequences 
that apparently are involved in polyA addition and 3' end cleavage. The same consensus sequences seem to be im- 
portant to both of these processes. These signals are typically a variation on the sequence AATAAA. In animal cells, 
some variants of this sequence that are functional have been identified; in plant cells there seems to be an extended 
range of functional sequences (Wickens and Stephenson, 1984; Dean et al., 1986). Because all of these consensus 

10 sequences are variations on AATAAA, they all are A+T rich sequences. This sequence is typically found 15 to 20 bp 
before the poiyA tract in a mature mRNA. Experiments in animal cells indicate that this sequence is involved in both 
polyA addition and 3' maturation. Site directed mutations in this sequence can disrupt these functions (Conway and 
Wickens, 1 988; Wickens et al., 1 987). However, it has also been observed that sequences up to 50 to 1 00 bp 3' to the 
putative polyA signal are also required; i.e., a gene that has a normal AATAAA but has been replaced or disrupted 

15 downstream does not get properly polyadenylated (Gil and Proudfoot, 1 984; Sadofsky and Alwine, 1 984; McDevitt et 
al., 1984). That is, the polyA signal itself is not sufficient for complete and proper processing. It is not yet known what 
specific downstream sequences are required in addition to the polyA signal, or if there is a specific sequence that has 
this function. Therefore, sequence analysis can only identify potential polyA signals. 

[0007] In naturally occuring mRNAs that are normally polyadenylated, it has been observed that disruption of this 
20 process, either by altering the polyA signal or other sequences in the mRNA, profound effects can be obtained in the 
level of functional mRNA. This has been observed in several naturally occuring mRNAs, with results that are gene 
specific so far. There are no general rules that can be derived yet from the study of mutants of these natural genes, 
and no rules that can be applied to heterologous genes. Below are four examples: 

25 1 . In a globin gene, absence of a proper polyA site leads to improper termination of transcription. It is likely, but 

not proven, that the improperly terminated RNA is nonfunctional and unstable (Proudfoot et al., 1987). 

2. In a globin gene, absence of a functional polyA signal can lead to a 100-fold decrease in the level of mRNA 
accumulation (Proudfoot et al., 1987). 

3. A globin gene polyA site was placed into the 3' ends of two different histone genes. The histone genes contain 
30 a secondary structure (stem-loop) near their 3' ends. The amount of properly polyadenylated histone mRNA pro- 
duced from these chimeras decreased as the distance between the stem-loop and the polyA site increased. Also, 
the two histone genes produced greatly different levels of properly polyadenylated mRNA. This suggests an inter- 
action between the polyA site and other sequences on the mRNA that can modulate mRNA accumulation (Pandy 
and Marzluff, 1987). 

35 4. The soybean leghemoglobin gene has been cloned into HeLa cells, and it has been determined that this plant 

gene contains a "cryptic" polyadenylation signal that is active in animal cells, but is not utilized in plant cells. This 
leads to the production of a new poiyadenylated mRNA that is nonfunctional. This again shows that analysis of a 
gene in one cell type cannot predict its behavior in alternative cell types (Wiebauer et al., 1988). 

40 [0008] From these examples, it is clear that in natural mRNAs proper polyadenylation is important in mRNA accu- 
mulation, and that disruption of this process can effect mRNA levels significantly. However, insufficient knowledge 
exists to predict the effect of changes in a normal gene. In a heterologous gene, where we do not know if the putative 
polyA sites (consensus sequences) are functional, it is even harderto predict the consequences. However, it is possible 
that the putative sites identified are disfunctional. That is, these sites may not act as proper polyA sites, but instead 

45 function as aberrant sites that give rise to unstable mRNAs. 

[0009] In animal cell systems, AATAAA is by far the most common signal identified in mRNAs upstream of the polyA, 
but at least four variants have also been found (Wickens and Stephenson, 1 984). In plants, not nearly so much analysis 
has been done, but it is clear that multiple sequences similar to AATAAA can be used. The plant sites below called 
major or minor refer only to the study of Dean et al. (1986) which analyzed only three types of plant gene. The desig- 

so nation of polyadenylation sites as major or minor refers only to the frequency of their occurrence as functional sites in 
naturally occurring genes that have been analyzed. In the case of plants this is a very limited database. It is hard to 
predict with any certainty that a site designated major or minor is more or less likely to function partially or completely 
when found in a heterologous gene such as B.t 

55 



3 



EP 0 385 962 B1 





PA 


AATAAA 


Major consensus site 




P1A 


AATAAT 


Major plant site 


5 


P2A 


AACCAA 


Minor plant site 




P3A 


ATATAA • 


ti 




P4A 


AATCAA 


it 


10 


PSA 


ATACTA 


it 




P6A 


ATAAAA 


■t 




P7A 


ATGAAA 


n 


15 


P8A 


AAGCAT 






P9A 


ATTAAT 


it 




P10A 


ATACAT 


it 


20 


P11A 


AAAATA 


it 


25 


P12A 


ATTAAA 


Minor animal site 




P13A 


AATTAA 


w 




P14A 


AATACA 


n 




P15A 


CATAAA 


tr 



30 



[0010] Another type of RNA processing that occurs in the nucleus is intron splicing. Nearly all of the work on intron 
processing has been done in animal cells, but some data is emerging from plants. Intron processing depends on proper 
5' and 3' splice junction sequences. Consensus sequences for these junctions have been derived for both animal and 

35 plant mRNAs, but only a few nucleotides are known to be invariant. Therefore, it is hard to predict with any certainty 
whether a putative splice junction is functional or partially functional based solely on sequence analysis. In particular, 
the only invariant nucleotides are GT at the 5' end of the intron and AG at the 3' end of the intron. In plants, at every 
nearby position, either within the intron or in the exon flanking the intron, all four nucleotides can be found, although 
some positions show some nucleotide preference (Brown, 1986; Hanley and Schuler, 1988). 

40 [0011] A plant intron has been moved from a patatin gene into a GUS gene. To do this, site directed mutagenesis 
was performed to introduce new restriction sites, and this mutagenesis changed several nucleotides in the intron and 
exon sequences flanking the GT and AG. This intron still functioned properly, indicating the importance of the GT and 
AG and the flexibility at other nucleotide positons. There are of course many occurences of GT and AG in all genes 
that do not function as intron splice junctions, so there must be some other sequence or structrual features that identify 

45 splice junctions. In plants, one such feature appears to be base composition perse. Wiebauer et al. (1 988) and Goodall 
et al. (1988) have analyzed plant introns and exons and found that exons have -50% A+T while introns have -70% 
A+T Goodall et al. (1988) also created an artificial plant intron that has consensus 5' and 3' splice junctions and a 
random A+T rich internal sequence. This intron was spliced correctly in plants. When the internal segment was replaced 
by a G+C rich sequence, splicing efficiency was drastically reduced. These two examples demonsatrate that intron 

so recognition in plants may depend on very general features -- splice junctions that have a great deal of sequence 
diversity and A+T richness of the intron itself. This, of course, makes it difficult to predict from sequence alone whether 
any particular sequence is likely to function as an active or partially active intron for RNA processing. 
[0012] BJ. genes being A+T rich contain numerous stretches of various lengths that have 70% or greater A+T. The 
number of such stretches identified by sequence analysis depends on the length of sequence scanned. 

55 [0013] As for polyadenylaticn described above, there are complications in predicting what sequences might be uti- 
lized as splice sites in any given gene. First, many naturally occuring genes have alternativ splicing pathways that 
cr ate alternative combinations of xonsinth final mRNA (Gallega and Nadal-Ginard, 1988; H Ifman and Ricci, 1988; 
Tsurushita and Kom, 1 989). That is, some splice junctions ar apparently • recognized under some circumstances or 
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in certain cell types, but not in others. The rules governing this are not understood. In addition, there can be an interaction 
between processing paths such that utilization of a particular polyadenylation sit can interfere with splicing at a nearby 
splice site and vice versa (Adami and Nevins, 1988; Brady and Wold, 1988; Marzluff and Pandey, 1988). Again no 
predictive rules are available. Also, sequence changes in a gene can drastically alter the utilization of particular splice 
5 junctions. For example, in a bovine growth hormone gene, small deletions in an exon a few hundred bases downstream 
of an intron cause the splicing effici ncy of the intron to drop from greater than 95% to less than 2% ( ssentially 
nonfunctional). Other deletions however have essentially no effect (Hampson and Rottman, 1988). Finally, a variety 
of in vitro and in vivo experiments indicate that mutations that disrupt normal splicing lead to rapid degradation of the 
RNA in the nucleus. Splicing is a multistep process in the nucleus and mutations in normal splicing can lead to blockades 
10 in the process at a variety of steps. Any of these blockades can then lead to an abnormal and unstable RNA. Studies 
of mutants of normally processed (polyadenylation and splicing) genes are relevant to the study of heterologous genes 
such as B.t. B.t. genes might contain functional signals that lead to the production of aberrant nonfunctional mRNAs, 
and these mRNAs are likely to be unstable. But the B.t genes are perhaps even more likely to contain signals that are 
analogous to mutant signals in a natural gene. As shown above these mutant signals are very likely to cause defects 
15 in the processing pathways whose consequence is to produce unstable mRNAs. 

[0014] It is not known with any certainty what signals RNA transcription termination in plant or animal cells. Some 
studies on animal genes that indicate that stretches of sequence rich in T cause termination by calf thymus RNA 
polymerase II in vitro. These studies have shown that the 3' ends of in vitro terminated transcripts often lie within runs 
of T such as T5, T6 or T7. Other identified sites have not been composed solely of T, but have had one or more other 
20 nucleotides as well. Termination has been found to occur within the sequences TATTTTTT, ATTCTC, TTCTT (Dedrick 
et al., 1987; Reines et al., 1987). In the case of these latter two, the context in which the sequence is found has been 
C+T rich as well. It is not known if this is essential. Other studies have implicated stretches of A as potential transcrip- 
tional terminators. An interesting example from SV40 illustrates the uncertainty in defining terminators based on se- 
quence alone. One potential terminator in SV40 was identified as being A rich and having a region of dyad symmetry 
25 (potential stem-loop) 5' to the A rich stretch. However, a second terminator identified experimentally downstream in 
the same gene was not A rich and included no potential secondary structure (Kessler et al., 1988). Of course, due to 
the A+T content of B.t. genes, they are rich in runs of A or T that could act as terminators. The importance of termination 
to stability of the mRNA is shown by the globin gene example described above. Absence of a normal polyA site leads 
to a failure in proper termination with a consequent decrease in mRNA. 
30 [0015] There is also an effect on mRNA stability due the translation of the mRNA. Premature translational termination 
in human triose phosphate isomerase leads to instability of the mRNA (Daar et al. , 1 988). Another example is the beta- 
thallesemic globin mRNA described above that is specifically unstable in bone marrow cells (Lim et al., 1988). The 
defect in this mutant gene is a single base pair deletion at codon 44 that leads to translational termination (a nonsense 
codon) at codon 60. Compared to properly translated normal globin mRNA, this mutant RNA is very unstable. These 
35 results indicate that an improperly translated mRNA is unstable. Other work in yeast indicates that proper but poor 
translation can have an effect on mRNA levels. A heterologous gene was modified to convert certain codons to more 
yeast preferred codons. An overall 10-fold increase in protein production was achieved, but there was also about a 
3-fold increase in mRNA Hoekema et al., 1 987). This indicates that more efficient translation can lead to greater mRNA 
stability, and that the effect of codon usage can be at the RNA level as well as the translational level. It is not clear 
40 from codon usage studies which codons lead to poor translation, or how this is coupled to mRNA stability. 

[001 6] EP-A-0 359 472 discloses modifying B.t. sequences to render them more plant-like. The sequence is modified 
so that the codon usage in the sequence is approximately the same as the codon usage in a plant. In contrast, the 
claimed invention is related to a specific methodology for increasing the expression of the gene in a plant by removing 
the occurrence of particular DNA sequences. 
45 [0017] Therefore, it is an object of the present invention to provide a method for preparing synthetic plant genes 
which express their respective proteins at relatively high levels when compared to wild-type genes. It is yet another 
object of the present invention to provide synthetic plant genes which express the crystal protein toxin of Bacillus 
thuringiensis at relatively high levels. 

so BRIEF DESCRIPTION OF THE DRAWINGS 

[0018] 

Figure 1 illustrates the steps employed in modifying a wild-type gene to increase expression efficiency in plants. 
55 Figure 2 illustrates a comparison of the changes in the modified B.tk. HD-1 sequence of Example 1 (lower line) 

versus th wild-type sequence of B.tk. HD-1 which ncodes the crystal prot in toxin (upper line). 
Figur 3 illustrates a comparison of th changes in the synthetic B.tk. HD-1 sequenc of Exampl 2(lowerlin) 
versus the wild-type sequence of B.tk. HD-1 which ncodes the crystal protein toxin (upper line). 
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Figure 4 illustrates a comparison of the changes in the synthetic B.tk. HD-73sequenc of Exampi 3 (low rlin ) 
versus the wild-type sequence of B.tk. HD-73 (upper line). 

Figure 5 represents a plasmid map of intermediate plant transformation vector cassette pMON893. 
Figure 6 represents a plasmid map of intermediate plant transformation vector cassette pMON900. 

5 Figure 7 represents a map for the disarmed T-DNA of A. tumefaciens AGO. 

Figure 8 illustrates a comparison of the changes in the synthetic truncated B.tk. HD-73 gene (Amino acids 29-61 5 
with an N-terminal Met-Ala) of Example 3 (lower line) versus the wild-type sequence of B.tk. HD-73 (upper line). 
Figure 9 illustrates a comparison of the changes in the synthetic/wild-type full length B.tk. HD-73 sequence of 
Example 3 (lower line) versus the wild-type full-length sequence of B.tk. HD-73 (upper line). 

to Figure 10 illustrates a comparison of the changes in the synthetic/modified full length B.tk. HD-73 sequence of 

Example 3 (lower line) versus the wild-type full-length sequence of B.tk. HD-73 (upper line). 
Figure 1 1 illustrates a comparison of the changes in the fully synthetic full-length B. tk. HD-73 sequence of Example 
3 (lower line) versus the wild-type full-length sequence of B.tk. HD-73 (upper line). 

Figure 1 2 illustrates a comparison of the changes in the synthetic B.tt sequence of Example 5 (lower line) versus 
« the wild-type sequence of B.tt which encodes the crystal protein toxin (upper line). 

Figure 13 illustrates a comparison of the changes in the synthetic B.t P2 sequence of Example 6 (lower 
Figure 14 illustrates a comparison of the changes in the synthetic B.t entomocidus sequence of Example 7 (lower 
line) versus the wild-type sequence of B.t entomocidus which encodes the Btent protein toxin (upper line). 
Figure 1 5 illustrates a plasmid map for plant expression cassette vector pMON744. 
20 Figure 1 6 illustrates a comparison of the changes in the synthetic potato leaf roll virus (PLRV) coat protein sequence 

of Example 9 (lower line) versus the wild-type coat protein sequence of PLRV (upper line). 

STATEMENT OF THE INVENTION 

25 [0019] The present invention provides a method for modifying a wild-type structural gene sequence which encodes 
an insecticidal protein of Bacillus thuringiensis Xo enhance the expression of said protein in plants which comprises: 

a) identifying regions within said sequence with greater than four consecutive adenine or thymine nucleotides; 

30 b) modifying the regions of step (a) which have two or more polyadenylation signals within a ten base sequence 

to remove said signals while maintaining a gene sequence which encodes said protein; and 

c) modifying the 15-30 base regions surrounding the regions of step (a) to remove major plant polyadenylation 
signals, consecutive sequences containing more than one minor polyadenylation signal and consecutive sequenc- 
es es containing more than one ATTTA sequence while maintaining a gene sequence which encodes said protein. 

[0020] The invention further provides a method for modifying a wild-type structural gene sequence which encodes 
an insecticidal protein of Bacillus thuringiensis to enhance the expression of said protein in plants which comprises: 

40 a) removing polyadenylation signals contained in said wild-type gene while retaining a sequence which encodes 

said protein; and 

b) removing ATTTA sequences contained in said wild-type gene while retaining a sequence which encodes said 
protein. 

45 

[0021J According to a further embodiment a method for improving the expression of a heterologous gene in plants 
is provided, wherein said gene comprises a modified chimeric gene containing a promoter which functions in plant 
cells operably linked to a structural coding sequence and a 3' non-translated region containing a polyadenylation signal 
which functions in plants to cause the addition of polyadenylate nucleotides to the 3' end of the RNA, and wherein said 
so structural coding sequence encodes an insecticidal protein at least a portion of which was derived from a Bacillus 
thuringiensis protein, wherein said method comprises modifying said structural coding sequence so that said sequence 
has a DNA sequence which differs from the naturally occurring DNA sequence encoding said Bacillus thuringiensis 
protein and said structural coding sequence does not contain more than 5 consecutive nucleotides consisting of either 
adenine or thymine residues. 

55 [0022] As a further embodiment, a method for improving the expression of a heterologous gene in plants is provided, 
wherein said gen compris s a modified chimeric g n containing a promoter which functions in plant c lis operably 
linked to a structural coding sequence and a 3' non-translated region containing a polyad nylation signal which func- 
tions in plants to caus th addition of polyadenylate nucleotides to th 3' end of th RNA, wherein said structural 
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coding sequence encodes an insecticidal protein at least a portion of which was derived from a Bacillus thuringiensis 
protein, wherein said method comprises modifying said structural coding sequence so that said sequence has a DNA 
sequence which differs from the naturally occurring DNA sequence encoding said Bacillus thuringiensis protein and 
has the following characteristics: 

5 

said structural coding sequence has a region which is complementary to the following sequ nee: 



GGCTTGATTCCTAGCGAACTCTTCGATTCTCTGGTTGATGAGCTGTTC 
10 1 5 10 15 20 25 30 35 40 45 



said region in said coding sequence having eliminated 2 AACCAA and 1 AATTAA sequence. 

is [0023] The present invention provides a method for preparing synthetic plant genes which encode the crystal protein 
toxin of Bacillus thuringiensis {B.t). Suitable B.t subspecies include, but are not limited to, B.t kurstaki HD-1, B.t 
kurstaki HD-73, B.t sotto, B.t berliner, B.t thuringiensis, B.t totworthi, B.t dendrolimus, B.t alesti, B.t galleriae, B.t 
aizawai, B.t subtoxicus, B.t entomocidus, B.t tenebrionis and B.t. san diego. 

[0024] The expression of B.t genes in plants is problematic. Although the expression of B.t genes in plants at in- 

20 secticidal levels has been reported, this accomplishment has not been straightforward. In particular, the expression of 
a full-length lepidopteran specific B.t gene (comprising DNA from a B.tk. isolate) has been reported to be unsuccessful 
in yielding insecticidal levels of expression in some plant species (Vaeck et al., 1987 and Barton et al., 1987). 
[0025] It has been reported that expression of the full-length gene from B.tk. HD-1 was detectable in tomato plants 
but that truncated genes led to a higher frequency of insecticidal plants with an overall higher level of expression. 

2s Truncated genes of B.t. berliner also led to a higher frequency of insecticidal plants in tobacco (Vaeck et al., 1987). 
On the other hand, insecticidal plants were provided from lettuce transformants using a full-length gene. 
[0026] It has also been reported that the full length gene from B.tk. HD-73 gave some insecticidal effect in tobacco 
(Adang et al., 1987). However, the B.t mRNA detected in these plants was only 1 .7 kb compared to the expected 3.7 
kb indicating improper expression of the gene. It was suggested that this truncated mRNA was too short to encode a 

30 functional truncated toxin, but there must have been a low level of longer mRNA in some plants or no insecticidal 
activity would have been observed. Others have reported in a publication that they observed a large amount of shorter 
than expected mRNA from a truncated B.tk gene, but some mRNA of the expected size was also observed. In fact, 
it was suggested that expression of the full length gene is toxic to tobacco callus (Barton et al., 1987). The above 
illustrates that lepidopteran type B. t genes are poorly expressed in plants compared to other chimeric genes previously 

35 expressed from the same promoter cassettes. 

[0027] The expression of B.tt in tomato and potato is at levels similar to that of B.tk. (i.e., poor). B.tt and B.tk. 
genes share only limited sequence homology, but they share many common features in terms of base composition 
and the presence of particular A+T rich elements. 

[0028] All reports in the field have noted the lower than expected expression of B.t genes in plants. In general, 
40 insecticidal efficacy has been measured using insects very sensitive to B.t toxin such as tobacco hornworm. Although 
it has been possible to obtain plants totally protected against tobacco hornworm, it is important to note that hornworm 
is up to 500 fold more sensitive to B.t. toxin than some agronomically important Insect pests such as beet armyworm. 
It is therefore of interest to obtain transgenic plants that are protected against all important lepidopteran pests (or 
against Colorado potato beetle in the case of B.t tenebrionis), and in addition to have a level of B.t expression that 
45 provides an additional safety margin over and above the efficacious protection level. It is also important to devise plant 
genes which function reproducibly from species to species, so that insect resistant plants can be obtained in a predict- 
able fashion. 

[0029] In orderto achieve these goals, it is important to understand the nature of the poorer than expected expression 
of at genes in plants. The level of stable B.t mRNA in plants is much lower than expected. That is, compared to other 

so coding sequences driven by the same promoter, the level of B.t mRNA measured by Northern analysis or nuclease 
protection experiments is much lower. For example, tomato plant 337 (Fischhoff et al., 1 987) was selected as the best 
expressing plant with pMON9711 which contains the B.tk. HD-1 Kpnl fragment driven by the CaMV 35S promoter and 
contains the NOS-NPTI l-NOS selectable marker gene. In this plant the level of 8. f. mRNA Is between 1 00 to 1 000 fold 
lower than the level of NPTII mRNA, even though the 35S promoter is approximately 50-fold stronger than the NOS 

55 promoter (Sanders et al., 1 987). 

[0030] The I vel of B.t toxin protein detected in plants is consistent with the low level of B.t mRNA. Mor over, th 
insecticidal efficacy of the transg nic plants correlates with the B.t protein lev I indicating that th toxin protein pro- 
duced in plants is biologically active. Therefore, the low level of B.t toxin expression may be the result of th low levels 
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of at mRNA. 

[0031] Messenger RNA levels are determined by the rate of synthesis and rate of degradation. It is the balance 
between these two that determines the steady stat lev I of mRNA. The rate of synthesis has been maximized by the 
us of the CaMV 35S promoter, a strong constitutive plant expressible promoter. The use of other plant promoters 

5 such as nopaline synthase (NOS), mannopine synthase (MAS) and ribulose bisphosphatecarboxylase small subunit 
(RUBISCO) hav not led to dramatic changes in th levels of B.t toxin protein expression indicating that the ffects 
determining B.t toxin protein levels are promoter independent. These data imply that the coding sequences of DNA 
genes encoding B.t toxin proteins are somehow responsible for the poor expression level, and that this effect is man- 
ifested by a low level of accumulated stable mRNA. 

10 [0032] Lower than expected levels of mRNA have been observed with four different lepidopteran specific genes (two 
from B.tk. HD-1; B.t. berlinerand B.tk. HD-73) as well as the gene from the coleopteran specific B.t tenebrionis. It 
appears thatf or lepidopteran type B.t. genes these effects are manifest more strongly in the full length coding sequences 
than in the truncated coding sequences. These effects are seen across plant species although their magnitude seems 
greater in some plant species such as tobacco. 

75 [0033] The nature of the coding sequences of B.t. genes distinguishes them from plant genes as well as many other 
heterologous genes expressed in plants. In particular, B.t genes are very rich (—62%) in adenine (A) and thymine (T) 
while plant genes and most bacterial genes which have been expressed in plants are on the order of 45-55% A+T The 
A+T content of the genomes (and thus the genes) of any organism are features of that organism and reflect its evolu- 
tionary history. While within any one organism genes have similar A+T content, the A+T content can vary tremendously 

20 from organism to organism. For example, some Bacillus species have among the most A+T rich genomes while some 
Steptomyces species are among the least A+T rich genomes (-30 to 35% A+T). 

[0034] Due to the degeneracy of the genetic code and the limited number of codon choices for any amino acid, most 
of the "excess" A+T of the structural coding sequences of some Bacillus species are found in the third position of the 
codons. That is, genes of some Bacillus species have A or T as the third nucleotide in many codons. Thus A+T content 

25 in part can determine codon usage bias, in addition, it is clear that genes evolve for maximum function in the organism 
in which they evolve. This means that particular nucleotide sequences found in a gene from one organism, where they 
may play no role except to code for a particular stretch of amino acids, have the potential to be recognized as gene 
control elements in another organism (such as transcriptional promoters or terminators, polyA addition sites, intron 
splice sites, or specific mRNA degradation signals). It is perhaps surprising that such misread signals are not a more 

30 common feature of heterologous gene expression, but this can be explained in part by the relatively homogeneous 
A+T content (-50%) of many organisms. This A+T content plus the nature of the genetic code put clear constraints 
on the likliehood of occurence of any particular oligonucleotide sequence. Thus, a gene from E. coli with a 50% A+T 
content is much less likely to contain any particular A+T rich segment than a gene from B. thuringiensis. 
[0035] As described above, the expression of B.t toxin protein in plants has been problematic. Although the obser- 

35 vations made in other systems described above offer the hope of a means to elevate the expression level of B.t toxin 
proteins in plants, the success obtained by the present method is quite unexpected. Indeed, inasmuch as it has been 
recently reported that expression of the full-length B.tk. toxin protein in tobacco makes callus tissue necrotic (Barton 
et al., 1 987); one would reasonably expect that high level expression of B.t. toxin protein to be unattainable due to the 
reported toxicity effects. 

40 [0036] In its most rigorous application, the method of the present Invention involves the modification of an existing 
structural coding sequence ("structural gene") which codes for a particular protein by removal of ATTTA sequences 
and putative polyadenyiation signals by site directed mutagenesis of the DNA comprising the structural gene. It is most 
preferred that substantially all the polyadenyiation signals and ATTTA sequences are removed although enhanced 
expression levels are observed with only partial removal of either of the above identified sequences. Alternately if a 

45 synthetic gene is prepared which codes for the expression of the subject protein, codons are selected to avoid the 
ATTTA sequence and putative polyadenyiation signals. For purposes of the present invention putative polyadenyiation 
signals include, but are not necessarily limited to, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, 
ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA and CATAAA. In replacing the ATTTA 
sequences and polyadenyiation signals, codons are preferably utilized which avoid the codons which are rarely found 

50 in plant genomes. 

[0037] Another embodiment of the present invention, represented in the flow diagram of Figure 1 , employs a method 
for the modification of an existing structural gene or alternately the de novo synthesis of a structural gene which method 
is somewhat less rigorous than the method first described above. Referring to Figure 1 , the selected DNA sequence 
is scanned to identify regions with greater than four consecutive adenine (A) or thymine (T) nucleotides. The A+T 
55 regions are scanned for potential plant polyadenyiation signals. Although the absence of five or more consecutive A 
or T nucleotides eliminates most plant polyadenyiation signals, if there ar more than one of the minor polyadenyiation 
signals id ntif ied within ten nucl otides of ach other, then th nucleotide sequence of this r gion is preferably alt red 
to remove these signals while maintaining th original encoded amino acid sequenc . 
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[0038] The second step is to consider the 1 5 to 30 nucleotide regions surrounding the A+T rich region identified in 
step one. If the A+T content of the surrounding region is less than 80%, the region should be examined for polyade- 
nylation signals. Alteration of the region based on polyadenylation signals is dependent upon (1) the number of poly- 
adenylation signals present and (2) presence of a major plant polyadenylation signal. 
5 [0039] The extended region is examined for the presence of plant polyadenylation signals. Th polyadenylation 
signals are removed by site-directed mutagenesis of the DNA sequence. The ext nd d region is also examined for 
multiple copies of the ATTTA sequence which are also removed by mutagenesis. 

[0040] It is also preferred that regions comprising many consecutive A+T bases or G+C bases are disrupted since 
these regions are predicted to have a higher likelihood to form hairpin structure due to self-compiementarity. Therefore, 
10 insertion of heterogeneous base pairs would reduce the likelihood of self-complementary secondary structure formation 
which are known to inhibit transcription and/or translation in some organisms. In most cases, the adverse effects may 
be minimized by using sequences which do not contain more than five consecutive A+T or G+C. 

SYNTHETIC OLIGONUCLEOTIDES FOR MUTAGENESIS 

15 

[0041] The oligonucleotides used in the mutagenesis are designed to maintain the proper amino acid sequence and 
reading frame and preferably to not introduce common restriction sites such as Bgtll, Hindlll, Sacl, Kpnl, EcoRI, Ncol, 
Pstl and Sail into the modified gene. These restriction sites are found in multilinker insertion sites of cloning vectors 
such as plasmids pUC118 and pMON7258. Of course, the introduction of new polyadenylation signals, ATTTA se- 

20 quences or consecutive stretches of more than five A+T or G+C t should also be avoided. The preferred size for the 
oligonucleotides is around 40-50 bases, but fragments ranging from 1 8 to 1 00 bases have been utilized. In most cases, 
a minimum of 5 to 8 base pairs of homology to the template DNA on both ends of the synthesized fragment are main- 
tained to insure proper hybridization of the primer to the template. The oligonucleotides should avoid sequences longer 
than five base pairs A+T or G+C. Codons used in the replacement of wild-type codons should preferably avoid the TA 

25 or CG doublet wherever possible. Codons are selected from a plant preferred codon table (such as Table I below) so 
as to avoid codons which are rarely found in plant genomes, and efforts should be made to select codons to preferably 
adjust the G+C content to about 50%. 



Table I 



30 



35 



45 



50 



55 



Preferred Codon Usage in Plants 


Amino Acid 


Codon 


Percent Usage in Plants 


ARG 


CGA 


7 




CGC 


11 




CGG 


5 




CGU 


25 




AGA 


29 




AGG 


23 


LEU 


CUA 


8 




cue 


20 




CUG 


10 




CUU 


28 




UUA 


5 




UUG 


30 


SER 


UCA 


14 




UCC 


26 




UCG 


3 




UCU 


21 




AGC 


21 




AGU 


15 


THR 


ACA 


21 




ACC 


41 
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Tabl I 


(continued) 




Preferred Codon Usage in Plants 




Amino Acid 


Codon 


Percent Usage in Plants 


5 




ACG 
ACU 


7 
31 




PRO 


CCA 
CCC 


45 
19 


10 




CCG 

ecu 


9 
26 




ALA 


GCA 
GCC 


23 
32 


15 




GCG 
GCU 


3 
41 




GLY 


GGA 


32 


20 




GGC 
GGG 
GGU 


20 
11 

37 




ILE 


AUA 


12 


25 




AUC 
AUU 


45 
43 




VAL 


GUA 


9 


30 




GUC 


20 




GUG 
GUU 


28 
43 




LYS 


AAA 


36 


35 




AAG 


64 




ASN 


AAC 
AAU 


72 
28 


40 


GLN 


CAA 
CAG 


64 
36 




HIS 


CAC 


65 


45 




CAU 


35 




. GLU 


GAA 
GAG 


48 
52 


50 


ASP 


GAC 
GAU 


48 

52 




TYR 


UAC 


68 


55 




UAU 


32 




CYS 


UGC 


78 
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Table I 


(continued) 


Preferred Codon Usage in Plants 


Amino Acid 


Codon 


Percent Usage in Plants 




UGU 


22 


PHE 


UUC 


56 




UUU 


44 


MET 


AUG 


100 


TRP 


UGG 


100 



[0042] Regions with many consecutive A+T bases or G+C bases are predicted to have a higher likelihood to form 
15 hairpin structures due to self-complementarity. Disruption of these regions by the insertion of heterogeneous base 
pairs is preferred and should reduce the likelihood of the formation of self-complementary secondary structures such 
as hairpins which are known in some organisms to inhibit transcription (transcriptional terminators) and translation 
(attenuators). However, it is difficult to predict the biological effect of a potential hairpin forming region. 
[0043] It is evident to those skilled in the art that while the above description is directed toward the modification of 
20 the DN A sequences of wild-type genes, the present method can be used to construct a completely synthetic gene for 
a given amino acid sequence. Regions with five or more consecutive A+T or G+C nucleotides should be avoided. 
Codons should be selected avoiding the TA and CG doublets in codons whenever possible. Codon usage can be 
normalized against a plant preferred codon usage table (such as Table I) and the G+C content preferably adjusted to 
about 50%. The resulting sequence should be examined to ensure that there are minimal putative plant polyadenylation 
25 signals and ATTTA sequences. Restriction sites found in commonly used cloning vectors are also' preferably avoided. 
However, placement of several unique restriction sites throughout the gene is useful for analysis of gene expression 
or construction of gene variants. 

Plant Gene Construction 

30 

[0044] The expression of a plant gene which exists in double-stranded DN A form involves transcription of messenger 
RNA (mRNA) from one strand of the DNA by RNA polymerase enzyme, and the subsequent processing of the mRNA 
primary transcript inside the nucleus. This processing involves a 3' n on -translated region which adds polyadenylate 
nucleotides to the 3' end of the RIMA. Transcription of DNA into mRNA is regulated by a region of DNA usually referred 
55 to as the "promoter." The promoter region contains a sequence of bases that signals RNA polymerase to associate 
with the DNA and to initiate the transcription of mRNA using one of the DNA strands as a template to make a corre- 
sponding strand of RNA. 

[0045] A number of promoters which are active in plant ceils have been described in the literature. These include 
the nopaline synthase (NOS) and octopine synthase (OCS) promoters (which are carried on tumor-inducing plasmids 

40 of ' Agrobacterium tumefaciens), the Cauliflower Mosaic Virus (CaMV) 1 9S and 35S promoters, the light-inducible pro- 
moter from the small subunit of ribulose bis-phosphate carboxylase (ssRUBlSCO, a very abundant plant polypeptide) 
and the mannopine synthase (MAS) promoter (Velten et al. 1984 and Velten & Schell, 1985). All of these promoters 
have been used to create various types of DNA constructs which have been expressed in plants (see e.g., PCT pub- 
lication WO84/02913 (Rogers et al., Monsanto). 

45 [0046] Promoters which are known or are found to cause transcription of RNA in plant cells can be used in the present 
invention. Such promoters may be obtained from plants or plant viruses and include, but are not limited to, the CaM V35S 
promoter and promoters isolated from plant genes such as ssRUBlSCO genes. As described below, it is preferred that 
the particular promoter selected should be capable of causing sufficient expression to result in the production of an 
effective amount of protein. 

50 [0047] The promoters used in the DN Aconstructs (i.e. chimeric plant genes) of the present invention may be modified, 
if desired, to affect their control characteristics. For example, the CaMV35S promoter may be ligated to the portion of 
the ssRUBlSCO gene that represses the expression of ssRUBlSCO in the absence of light, to create a promoter which 
is active in leaves but not in roots. The resulting chimeric promoter may be used as described herein. For purposes of 
this description, the phrase "CaMVSSS" promoter thus includes variations of CaMV35S promoter, e.g., promoters de- 

55 rived by means of ligation with operator regions, random or controlled mutagenesis, etc. Furthermore, the promoters 
may be altered to contain multiple "enhancer sequences" to assist in elevating gene expression. 
[0048] The RNA produc d by a DNA construct of the present invention also contains a 5' non-translated leader 
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sequence. This sequence can be derived from th promoter selected to xpress the gen , and can be specifically 
modified so as to increase translation of th mRNA. The 5' non-translated regions can also be obtained from viral 
RNA's, from suitable eukaryotic genes, or from a synthetic gene sequ nee. The present invention is not limited to 
constructs, as presented in the following xamples. Rather, the non-translated leader sequence can be part of the 5' 
s nd of the non-translated region of the coding sequence for the virus coat protein, or part of the promoter sequence, 
or can be derived from an unrelated promoter or coding sequence. In any case, it is preferred that the sequence flanking 
the initiation site conform to the translations consensus sequence rules for nhanced translation initiation reported by 
Kozak(1984). 

[0049] The DNA construct of the present invention also contains a modified or fully-synthetic structural coding se- 
10 quence encoding the crystal toxin protein of Bacillus thuringiensiswhlch has been changed to enhance the performance 
of the gene in plants. The structural genes of the present invention may optionally encode a fusion protein comprising 
an amino-terminal chloroplast transit peptide or secretory signal sequence (see for instance, Examples 1 0 and 11 ). 
[0050] The DNA construct also contains a 3' non-translated region. The 3' non-translated region contains a polya- 
denylation signal which functions in plants to cause the addition of polyadenylate nucleotides to the 3' end of the viral 
is RNA. Examples of suitable 3' regions are (1) the 3' transcribed, non-translated regions containing the polyadenylation 
signal of Agrobacterium tumor-inducing (Ti) plasmid genes, such as the nopaiine synthase (NOS) gene, and (2) plant 
genes like the soybean storage protein (7S) genes and the small subunit of the RuBP carboxylase (E9) gene. An 
example of a preferred 3* region is that from the 7S gene, described in greater detail in the examples below. 



20 Plant Transformation 



[0051] A chimeric plant gene containing a structural coding sequence of the present invention can be inserted into 
the genome of a plant by any suitable method. Suitable plants for use in the practice of the present invention include, 
but are not limited to, soybean, cotton, alfalfa, oilseed rape, flax, tomato, sugarbeet, sunflower, potato, tobacco, maize, 

25 rice and wheat. Suitable plant transformation vectors include those derived from a Ti plasmid of Agrobacterium tume- 
faciens, as well as those disclosed, e.g., by Herrera-Estrella (1983), Bevan (1983), Klee (1985) and EPO publication 
120,51 6 (SchilperoortetaL). In addition to plant transformation vectors derived from the TI or root-inducing (Ri) plasmids 
of Agrobacterium, alternative methods can be used to insert the DNA constructs of this invention into plant cells. Such 
methods may involve, for example, the use of liposomes, electropo ration, chemicals that increase free DNA uptake, 

30 free DNA delivery via microprojectile bombardment, and transformation using viruses or pollen. 

[0052] A particularly useful Ti plasmid cassette vector for transformation of dicotyledonous plants is shown in Figure 
5. Referring to Figure 5, the expression cassette pMON893 consists of the enhanced CaMV35S promoter (EN 35S) 
and the 3' end including polyadenylation signals from a soybean gene encoding the alpha-prime subunit of beta-con- 
glycinin. Between these two elements is a multilinker containing multiple restriction sites for the insertion of genes. 

35 [0053] The enhanced CaMV35S promoter was constructed as follows. A fragment of the CaMV35S promoter ex- 
tending between position -343 and +9 was previously constructed in pUC13 by Odell et al. (1985). This segment 
contains a region identified by Odell etal. (1985) as being necessary for maximal expression of the CaMV35S promoter. 
It was excised as a Clal-Hindlll fragment, made blunt ended with DNA polymerase I (Klenow fragment) and inserted 
into the Hindi site of pUC18. This upstream region of the 35S promoter was excised from this plasmid as a Hindlll- 

40 EcoRV fragment (extending from -343 to -90) and inserted into the same plasmid between the Hindlll and Pstl sites. 
The enhanced CaMV35S promoter thus contains a duplication of sequences between -343 and -90 (Kay et al., 1 987). 
[0054] The 3' end of the 7S gene is derived from the 7S gene contained on the clone designated 1 7.1 (Schuler et 
al., 1982). This 3' end fragment, which includes the polyadenylation signals, extends from an Avail site located about 
30 bp upstream of the termination codon for the beta-con glycin in gene In clone 17.1 to an EcoRI site located about 

45 450 bp downstream of this termination codon. 

[0055] The remainder of pMON893 contains a segment of pBR322 which provides an origin of replication in E coli 
and a region for homologous recombination with the disarmed T-DNA in Agrobacterium strain ACO (described below); 
the oriV region from the broad host range plasmid RK1 ; the streptomycin/spectinomycin resistance gene from Tn7; 
and achimeric NPTII gene, containing the CaMV35S promoter and the nopaiine synthase (NOS) 3' end, which provides 

so kanamycin resistance in transformed plant cells. 

[0056] Referring to Figure 6, transformation vector plasmid pMON900 is a derivative of pMON893. The enhanced 
CaMV35S promoter of pMON893 has been replaced with the 1.5kb mannopine synthase (MAS) promoter (Velten et 
al. 1 984). The other segments are the same as plasmid pMON893. After incorporation of a DNA construct into plasmid 
vector pMON893 or pMON900, the intermediate vector is introduced into A. tumefaciens strain ACO which contains 

55 a disarmed Ti plasmid. Cointegrate TI plasmid vectors are selected and used to transform dicotyledonous plants. 
[0057] Referring to Figur 7, A. tumefaciens ACO is a disarm d strain similar to p71B6SE d scrib d by Fraley t al. 
(1 985). For construction of ACO the starting Agrobacterium strain was the strain A208 which contains a nopaline-type 
Ti plasmid. The TI plasmid was disarmed in a mann r similar to that describ d by Fraley etal. (1985) so that ssentially 
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all of the native T-DNA was removed except for the left border and a few hundr d base pairs of T-DNA inside the left 
border. The remainder of the T-DNA extending to a point just beyond the right border was replac d with a novel piec 
of DNA including (from left to right) a segment of pBR322, the oriV region from ptasmid RK2, and the kanamycin 
resistance gene from Tn601 . The pBR322 and oriV segments are similar to the segm nts in pMON893 and provide a 
region of homology for cointegrate formation. 

[0058] The following examples are provided to better elucidate the practice of the present invention and should not 
be interpreted in any way to limit the scope of the present invention. Those skilled in the art will recognize that various 
modifications, truncations etc. can be made to the methods and genes described herein while not departing from the 
spirit and scope of the present invention. 

Example 1 Modified B.tk. HD-1 Gene 

[0059] Referring to Figure 2, the wild-type B.tk, HD-1 gene is known to be expressed poorly in plants as a full length 
gene or as a truncated gene. The G+C content of the BAM. gene is low (37%) containing many A+T rich regions, 
potential polyadenylation sites (1 8 sites; see Table II for the list of sequences) and numerous ATTTA sequences. 



Table II 



List of Sequences of the Potential 
Polyadenylation Signals 

AATAAA* AAGCAT 

AATAAT* ATTAAT 

AACCAA ATACAT 

• ATATAA AAAATA 

AATCAA ATT AAA** 

ATACTA AATTAA* * 

ATAAAA AATACA* * 

ATGAAA CAT AAA** 

* indicates a potential major plant polyadenylation 
site. 

** indicates a potential minor animal polyadenylation 
site. 

All others are potential minor plant polyadenylation sites. 



[0060] Table III lists the synthetic oligonucleotides designed and synthesized for the site-directed mutagenesis of 
the B.tk. HD-1 gene. 
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Table III 



Mn1-aerPnp.fi Is Primers for n.t.k. HP-1 fipno 



Primer 



Length (bp) Sequence 



BTK185 



18 



TCCCCAGATA ATATCAAC 



BTK240 



48 



GGCTTGATTC CTAGCGAACT 
CTTCGATTCT CTGGTTGATG 
AGCTGTTC 



BTK462 



54 



CAAAACTGAG AGGTGGAGGT 
TGGCAGCTTG AACGTACACG 
GAGAGGAGAGGAAC 



BTK669 



48 



AGTTAGTGTA AGCTCTCTTC 
TGAACTGGTT GTACCTGATC 
CAATCTCT 



BTK930 



39 



AGCCATGATC TGGTGACCGG 
ACCAGTAGTA TTCTCCTCT 



BTK1110 



32 



AGTTGTTGGT TGTTGATCCC 
GATGTTAAAA GG 
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Table III - continued 



Mutagenesis Primers for g.t.ft. HD-1 Genft 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Primer 

BTK1380A 
BTK1380T 



Length ffrp) 

37 
100 



BTK1600 



27 



Sequence 

GTGATGAAGG GATGATGTTG 
TTGAACTCAG CACTACG 

CAGAAGTTCC AGAGCCAAGA 
TTAGTAGACT TGGTGAGTGG 
GATTTGGGTG ATTTGTGATG 
AAGGGATGAT GTTGTTGAAC 
TCAGCACTAC GATGTATCCA 

TGATGTGTGG AACTGAAGGT 
TTGTGGT 



[0061] The B.Ik. HD-1 gene (Bglll fragment from pMON9921 encoding amino acids 29-607 with a Met-Ala at the N- 
terminus) was cloned into pMON7258 (pUC11 8 derivative which contains a Bglll site in the multilinker cloning region) 
at the Bglll site resulting in pMON5342. The orientation of the B.t.k. gene was chosen so that the opposite strand 
(negative strand) was synthesized in filamentous phage particles for the mutagenesis. The procedure of Kunkle (1 985) 
was used for the mutagenesis using plasmid pMON5342 as starting material. 

[0062] The regions for mutagenesis were selected in the following manner. All regions of the DNA sequence of the 
B.t.k. gene were identified which contained five or more consecutive base pairs which were A or T. These were ranked 
in terms of length and highest percentage of A+T in the surrounding sequence over a 20-30 base pair region. The DNA 
was then analysed for regions which might contain polyadenylation sites (see Table II above) or ATTTA sequences. 
Oligonucleotides were designed which maximized the elimination of A+T consecutive regions which contained one or 
more polyadenylation sites or ATTTA sequences. Two potential plant polyadenylation sites were rated more critical 
(see Table II) based on published reports. Codons were selected which increased G+C content, did not generate 
restriction sites for enzymes useful for cloning and assembly of the modified gene (BamHI, Bglll, Sacl, Ncol, EcoRV) 
and did not contain the doublets TA or GC which have been reported to be infrequently found in codons in plants. The 
oligonucleotides were at least 18 bp long ranging up to 100 base pairs and contained at least 5-8 base pairs of direct 
homology to native sequences at the ends of the fragments for efficient hybridization and priming in site-directed mu- 
tagenesis reactions. Figure 2 compares the wild-type B.Lk. HD-1 gene sequence with the sequence which resulted 
from the modifications by site-directed mutagenesis. 

[0063] The end result of these changes was to increase the G+C content of B.t.k. gene from 37% to 41 % while also 
decreasing the potential plant polyadenylation sites from 18 to 7 and decreasing the ATTTA regions from 13 to 7. 
Specifically, the mutagenesis changes from amino (5') terminus to the carboxy (3') terminus are as follows: 
[0064] BTK1 85 is an 1 8-mer used to eliminate a plant polyadenylation site in the midst of a nine base pair region of 
A+T. 

[0065] BTK240 is a 48-mer. Seven base pairs were changed by this oligonucleotide to eliminate three potential 
polyadenylation sites (2 AACCAA, 1 AATTAA). Another region close to the region altered by BTK240, starting at bp 
312 t had a high A+T content (13 of 15 base pairs) and an ATTTA region. However, It did not contain a potential poly- 
ad nylation site and its longest string of unint rrupted A+T was sev n bas pairs. 

[0066] BTK462 is a 54-m r introducing 1 3 base pair changes. The first six changes w re to r duce the A+T richness 
of the gene by replacing wild-type codons with codons containing G and C while avoiding the CG doublet. The next 
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seven changes made by BTK462 were used to eliminate an A+T rich region (1 3 of 1 4 bas pairs were A or T) containing 
two ATTTA regions. 

[0067] BTK669 is a 48-mer making nine individual base pair changes eliminating three possible polyadenylation 
sites (ATATAA, AATCAA, and AATTAA) and a single ATTTA site. 

5 [0068] BTK930 is a 39-mer designed to increase the G+C content and to eliminate a potential polyadenylation site 
(AATAAT - a major site). This region did contain a nine base pair region of consecutive A+T sequence. One of the base 
pair changes was a G to A because a G at this position would have created a G+C rich region (CCGG(G)C). Since 
sequencing reactions indicate that there can be difficulties generating sequence through G+C consecutive bases, it 
was thoug ht to be prudent to avoid generating potentially problematic regions even if they were problematic only in vitro. 

id [0069] BTK1110 is a 32-mer designed to introduce five changes in the wild-type gene. One potential site (AATAAT 
- a major site) was eliminated in the midst of an A+T rich region (1 9 of 22 base pairs). 

[0070] BTK1380A and BTK1380T are responsible for 14 individual base pair changes. The first region (1380A) has 
17 consecutive A+T base pairs. In this region is an ATTTA and a potential polyadenylation site (AATAAT). The 1 00-mer 
(1380T) contains all the changes dictated by 1380 A. The large size of this primer was in part an experiment to determine 

*5 if it was feasible to utilize large oligonucleotides for mutagenesis (over 60 bases in length). A second consideration 
was that the 1 00-mer was used to mutagenize a template which had previously been mutageneized by 1380A. The 
original primer ordered to mutagenize the region downstream and adjacent to 1380A did not anneal efficiently to the 
desired site as indicated by an inability to obtain clean sequence utilizing the primer. The large region of homology of 
1380T did assure proper annealing. The extended size of 1380T was more of a convenience rather than a necessity. 

20 The second region adjacent to 1 380A covered by 1 380T has a high A+T content (22 of 29 bases are A or T). 

[0071 ] BTK1 600 is a 27-mer responsible for five individual base pair changes. An ATTTA region and a plant polya- 
denylation site were identified and the appropriate changes engineered. 

[0072] A total of 62 bases were changed by site-directed mutagenesis. The G+C content increased by 55 base pairs, 
the potential polyadenylation sites were reduced from 18 to seven and the ATTTA sequences decreased from 13 to 
25 seven. The changes in the DNA sequence resulted in changes in 55 of the 579 codons in the truncated B.tk. gene in 
pMON5342 (approximately 9,5%). 

[0073] Referring to Table IV modified B.tk. HD-1 genes were constructed that contained all of the above modifications 
(pMON5370) or various subsets of individual modifications. These genes were inserted ' into pMON893 for plant trans- 
formation and tobacco plants containing these genes were analyzed. The analysis of tobacco plants with the individual 

30 modifications was undertaken for several reasons. Expression of the wild type truncated gene in tobacco is very poor, 
resulting in infrequent identification of plants toxic to THW, Toxicity is defined by leaf feeding assays as at least 60% 
mortality of tobacco hornworm neonate larvae with a damage rating of 1 or less (scale is 0 to 4; 0 is equivalent to total 
protection, 4 total damage). The modified HD-1 gene (pMON5370) shows a large increase in expression (estimated 
to be approximately 100-fold; see Table VIII) in tobacco. Therefore, increases in expression of the wild-type gene due 

35 to indidvidual modifications would be apparently a large increase in the frequency of toxic tobacco plants and the 
presence of detectable B.tk. protein. Results are shown in the following table: 

Table IV 

Relative effects of Regional Modifications within the B.tk. Gene 



Construct 


Position Modified 


# of Plants 


# of Toxic Plants 


pMON5370 


185, 240, 669, 930, 1110, 1380a+b, 1600 


38 - 


22 


pMON 10707 


185, 240, 462, 669 


48 


19 


pMON 10706 


930, 1110, 1380a+b, 1600 


43 


1 


pMON 10539 


185 


55 


2 


pMON 10537 


240 


57 


17 


pMON 10540 


185,240 


88 


23 


pMON 10705 


462 


47 


1 



[0074] Th eff cts of each individual oligonucleotides' changes on expression did r v al some overall tr nds. Six 
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different constructs were generated which were designed to identify the key regions. Th nine diff rent oligonucleotides 
were divided in half by their position on the gene. Changes in the N-terminal half were incorporated into pMON10707 
(185,240, 462,669). C-terminal half changes were incorporated into pMON10706 (930,1 11 0,1 380a+b,1 600). The re- 
sults of analysis of plants with these two constructs indicate that pMON1 0707 produces a substantial number of toxic 

5 plants (1 9 of 48). Protein from these plants is detectable by ELISA analysis. pMON1 0706 plants were rarely identified 
as insecticidal (1 of 43) and the levels of B.tk. were barely detectable by immunological analysis. Inv stigation of the 
N-terminal changes in greater detail was done with 4 pMON constructs; 1 0539 (1 85 alone), 1 0537 (240 alone), 1 0540 
(185 and 240) and 1 0705 (462 alone). The results indicate that the presence of the changes in 240 were required to 
generate a substantial number of toxic plants (pMON1 0540; 23 of 88, pMON1 0537; 1 7 of 57). The absence of the 240 

10 changes resulted in a low frequency of toxic plants with low B.tk. protein levels, identical to results with the wild type 
gene. These results indicate that the changes in 240 are responsible for a substantial increase in B.tk. expression 
levels over an analogous wild-type construct in tobacco. Changes in additional regions (185,462,669) in conjunction 
with 240 may result in increases in B.tk. expression (>2 fold). However, changes at the 240 region of the N-terminal 
portion of the gene do result in dramatic increases in expression. 

'5 [0075] Despite the importance of the alteration of the 240 region in expression of modified genes, increased expres- 
sion can be achieved by alteration of other regions. Hybrid genes, part wild-type, part synthetic, were generated to 
determine the effects of synthetic gene segments on the levels of B.tk. expression. A hybrid gene was generated with 
a synthetic N-terminal third (base pair 1 to 590 of Figure 2: to the Xbal site) with the C-terminal wild type B.tk. HD-1 
(pMON5378) Plants transformed with this vector were as toxic as plants transformed with the modified HD-1 gene 

20 (pMON5370). This is consistent with the alteration of the 240 region. However, pMON1 0538, a hybrid with a wild-type 
N-terminal third (wild type gene for the first 600 base pairs, to the second Xbal site) and a synthetic C-terminal last 
two-thirds (base pair 590 to 1 845 of Figure 3 was used to transform tobacco and resulted in a dramatic increase in 
expression. The levels of expression do not appear to be as high as those seen with the synthetic gene, but are 
comparable to the modified gene levels. These results indicate that modification of the 240 segment is not essential 

25 to increased expression since pMON1 0538 has an intact 240 region. A fully synthetic gene is, in most cases, superior 
for expression levels of B.tk. (See Example 2.) 

Example 2 -- Fully Synthetic B.tk. HD-1 Gene 

30 [0076] A synthetic B.tk. HD-1 gene was designed using the preferred plant codons listed in Table V below. Table V 
lists the codons and frequency of use in plant genes of dicotyledonous plants compared to the frequency of their use 
in the wild type B.tk. HD-1 gene (amino acids 1 -61 5) and the synthetic gene of this example. The total number of each 
amino acid in this segment of the gene is listed in the parenthesis under the amino acid designated. 



Table V 





Codon in Usage Synthetic B.tk HD-1 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt B.t.kJSyn 




ARG 


CGA 


7 


11 


2 


40 


(43) 


CGC 


11 


5 


5 






CGG 


5 


2 


0 






CGU 


25 


14 


27 


45 




AGA 


29 


55 


41 






AGG 


23 


14 


25 




LE 


CUA 


8 


16 


4 


SO 


(49) 


cue 


20 


0 


20 






CUG 


10 


2 


6 






CUU 


28 


22 


24 






UUA 


5 


50 


0 


55 
















UUG 


30 


10 


45 



17 
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Table V (continued) 



Codon in Usage Synthetic BA.k. HD-1 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt B.t.kJSyn 


SER 


UCA 


14 


27 


5 


(64) 


UCC 


26 


9 


28 




UCG 


3 


8 


0 




UCU 


21 


19 


31 




AGC 


21 


6 


32 




AGU 


15 


31 


5 


THR 


ACA 


21 


31 


14 


(42) 


ACC 


41 


19 


53 




ACG 


7 


14 


0 




ACU 


31 


36 


33 


PRO 


CCA 


45 


35 


53 


(34) 


CCC 


19 


6 


12 




CCG 


9 


21 


3 




ecu 


26 


38 


32 


ALA 


GCA 


23 


38 


26 


(31) 


GCC 


32 


9 


29 




GCG 


3 


3 


0 




GCU 


41 


50 


45 


GLY 


GGA 


32 


52 


45 


(46) 


GGC 


20 


17 


15 




GGG 


11 


15 


6 




GGU 


37 


15 


34 


ILE 


AUA 


12 


39 


2 


(46) 


AUC 


45 


11 


67 




AUU 


43 


50 


30 


VAL 


GUA 


9 


45 


3 


(38) 


GUC 


20 


5 


16 




GUG 


28 


11 


37 




GUU 


43 


39 


45 


LYS 


AAA 


36 


100 


33 


(3) 


AAG 


64 


0 


67 


ASN 


AAC 


72 


27 


80 


(44) 


AAU 


28 


73 


20 



18 
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Table V (continued) 



5 



15 



20 



25 



30 



Codon in Usage Synthetic B.t.k. HD-1 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt ar.fc./Syn 


GLN 


CAA 


64 


77 


61 


(31) 


CAG 


36 


23 


39 


HIS 


CAC 


65 


0 


80 


(10) 


CAU 


35 


100 


20 


GLU 


GAA 


48 


87 


50 


(30) 


GAG 


52 


13 


50 


ASP 


GAC 


48 


17 


65 


(23) 


GAU 


52 


83 


35 


TYR 


UAC 


68 


20 


72 


(25) 


UAU 


32 


80 


28 


CYS 


UGC 


*7Q 

to 


bU 


I uu 


(2) 


u ou 


22 


50 


0 


PHE 


UUC 


56 


17 


83 


(36) 


UUU 


44 


83 


17 


MET 


AUG 


100 


100 


100 


(9) 










TRP 


UGG 


100 


100 


100 


(9) 











[0077] The resulting synthetic gene lacks ATTTA sequences, contains only one potential polyadenylation site and 
has a G+C content of 48.5%. Figure 3 is a comparison of the wild-type HD-1 sequence to the synthetic gene sequence 
for amino acids 1-615. There is approximately 77% DNA homology between the synthetic gene and the wild-type gene 
and 356 of the 615 codons have been changed (approximately 60%). 



Example 3 - Synthetic B.t.k. HD-73 Gene 

[0078] The crystal protein toxin from B.Ik. HD-73 exhibits a higher unit activity against some important agricultural 
pests. The toxin protein of HD-1 and HD-73 exhibit substantial homology (-90%) in the N-terminal 450 amino acids, 
but differ substantially in the amino acid region 451-615. Fusion proteins comprising amino acids 1-450 of HD-1 and 
451 -615 of HD-73 exhibit the insecticidal properties of the wild-type HD-73. The strategy employed was to use the 5'- 
two thirds of the synthetic HD-1 gene (first 1350 bases, up to the Sad site) and to dramatically modify the final 590 
bases (through amino acid 645) of the HD-73 in a manner consistent with the algorithm used to design the synthetic 
HD-1 gene. Table VI below lists the oligonucleotides used to modify the HD-73 gene in the order used in the gene from 
5' to 3' end. Nine oligonucleotides were used in a 590 base pair region, each nucleotide ranging in size from 33 to 60 
bases. The only regions left unchanged were areas where there were no long consecutive strings of A or T bases 
(longer than six). All polyadenylation sites and ATTTA sites were eliminated. 
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Table VI 



Mutagenesis Primers for B.t.k. HD-73 



Primer 



Lftngth (bp) 



Sequence 



73K1363 



51 



AATACTATCG GATGCGATGA 
TGTTGTTGAA CTCAGCACTA 
CGGTGTATCC A 



73K1437 



33 



TCCTGAAATG ACAGAACCGT 
TGAAGAGAAA GTT 



73K1471 



48 



ATTTCCACTG CTGTTGAGTC 
TAACGAGGTC TCCACCAGTG 
AATCCTGG 



73K1561 



60 



GTGAATAGGG GTCACAGAAG 
CATACCTCAC ACGAACTCTA 
TATCTGGTAG ATGTTGGATGG 



73K1642 



33 



TGTAGCTGGA ACTGTATTGG 
AGAAGATGGA TGA 



73K1675 



48 



TTCAAAGTAA CCGAAATCGC 
TGGATTGGAG ATTATCCAAG 
GAGGTAGC 



73K1741 



39 



ACTAAAGTTT CTAACACCCA 
CGATGTTACC GAGTGAAGA 



20 



5 
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Table VI - continued 
Mutagenesis Primers for B.t.k. HP-73 



Primer Length (bp) Sequence 

10 

73K1797 36 AACTGGAATG AACTCGAATC 

TGTCGATAAT CACTCC 

15 

73KTERM 54 GGACACTAGA TCTTAGTGAT 

AATCGGTCAC ATTTGTCTTG 
AGTCCAAGCT GGTT 

20 

[0079] The resulting gene has two potential polyadenylation sites (compared to 18 in the WT) and no ATTTA se- 
quence (1 2 in the WT). The G+C content has increased from 37% to 48%. A total of 59 individual base pair changes 
were made using the primers in Table VI. Overall, there is 90% DNA homology between the region of the HD-73 gene 

25 modified by site directed mutagenesis and the wild-type sequence of the analogous region of HD-73. The synthetic 
HD-73 is a hybrid of the first 1360 bases from the synthetic HD-1 and the next 590 bases or so modified HD-73 se- 
quence. Figure 4 is a comparison of the above-described synthetic B.tk. HD-73 and the wild-type B.t.k. HD-73 encoding 
amino acids 1 -645. In the modified region of the HD-73 gene 44 of the 1 70 codons (25%) were changed as a result of 
the site-directed mutagenesis changes resulting from the oligonucleotides found in Table VI. Overall, approximately 

30 50% of the codons in the synthetic B.tk. HD-73 differ from the analogous segment of the wild-type and HD-73 gene. 
[0080] A one base pair deletion in the synthetic HD-73 gene was detected in the course of sequencing the 3' end at 
base pair 1890. This results in a frame-shift mutation at amino acid 625 with a premature stop codon at amino acid 
640 (pMON5379). Table VII below compares the codon usage of the wild-type gene of B.t.k. HD-73 versus the synthetic 
gene of this example for amino acids 451 -645 and codon usage of naturally occurring genes of dicotyledonous plants. 

35 The total number of each amino acid encoded in this segment of the gene is found in the parentheses under the amino 
acid designation. 



Table VII 



Codon Usage in Synthetic B.t.k. HD-73 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt HD-73/Syn 


ARG 


CGA 


7 


10 


0 


(10) 


CGC 


11 


0 


8 




CGG 


5 


10 


0 




CGU 


25 


20 


23 




AGA 


29 


60 


62 




AGG 


23 


0 


8 



55 
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Table VII (continued) 





Codon Usage in Synthetic B.t.k. HD-73 Gen 




Amino Acid 


Codon 


Percent Usage in Plants/Wt HD-73/Syn 


5 


LEU 


CUA 


8 


25 


8 




(12) 


cue 


20 


17 


58 






CUG 


10 


17 


8 


10 




CUU 
UUA 


28 
5 


8 

33 


0 
8 






UUG 


30 


0 


17 


15 


SER 


UCA 


14 


24 


18 


(21) 


UCC 


26 


10 


27 


\ 




UCG 


3 


10 


0 






UCU 


21 


24 


18 


20 




AGC 


21 


0 


14 






AGU 


1 0 


JO 






THR 


ACA 


21 


47 


38 


25 


(15) 


ACC 


41 


13 


31 






ACG 


7 


13 


0 






ACU 


31 


27 


31 


30 


PRO 


CCA 


45 


71 


71 




(7) 


CCC 


19 


0 


0 






CCG 


9 


14 


0 


35 




ecu 


do 


H A 
14 


da 




ALA 


GCA 


23 


29 


31 




(14) 


GCC 


32 


7 


8 


40 




GCG 


3 


21 


15 






GCU 


41 


43 


46 

~N 




GLY 


GGA 


32 


33 


43 


45 


(15) 


GGC 


20 


0 


0 






GGG 


11 


27 


14 






GGU 


37 


40 


43 


50 


ILE 


AUA 


12 


33 


7 




(15) 


AUC 


45 


7 


40 






AUU 


43 


60 


53 



55 



22 
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Table VII (continued) 



5 



10 



15 



20 



25 



45 



Codon Usage in Synthetic B.t.k. HD-73 Gene 


Amino Acid 


Codon 


Percent Usage in Plants/Wt HD-73/Syn 


VAL 


GUA 


9 


40 


7 


(15) 


GUC 


20 


0 


7 




GUG 


28 


20 


36 




GUU 


43 


40 


50 


LYS 


AAA 


36 


67 


100 


(3) 


AAG 


64 


33 


0 


ASN 


AAC 


72 


20 


53 


(20) 


AAU 


28 


80 


47 


GLN 


CAA 


64 


60 


67 


(5) 


CAG 


36 


40 


33 


HIS 


CAC 


65 


67 


100 


(3) 


CAU 


35 


33 


0 


GLU 


GAA 


48 


86 


57 


(7) 


GAG 


52 


14 


43 


ASP 


GAC 


48 


40 


50 


(5) 


GAU 


52 


60 


50 


TYR 


UAC 


68 


o 


20 


(5) 


UAU 


32 


100 


80 


CYS 


UGC 


78 


0 


0 


(°) 


UGU 


22 


0 


0 


PHE 


UUC 


56 


8 


67 


(13) 


UUU 


44 


92 


33 


MET 


AUG 


100 


100 


100 


(2) 










TRP 


UGG 


100 


100 


100 


(2) 











50 

[0081] Another truncated synthetic HD-73 gene was constructed. The sequence of this synthetic HD-73 gene is 
identical to that of the above synthetic HD-73 gene in the region in which they overlap (amino acids 29-615), and it 
also encodes Met-Ala at the N-terminus. Figure 8 shows a comparison of this truncated synthetic HD-73 gene with the 
N-terminal Met-Ala versus the wild-type HD-73 gene. 
55 [0082] While the previous examples have been directed at the preparation of synthetic and modified genes encoding 
truncated B.t.k. proteins, synthetic or modified genes can also b prepar d which encode full length toxin proteins. 
[0083] One full length B.t.k. g ne consists of the synthetic HD-73 s quenc of Figure 4 from nucleotid 1 -1 845 plus 
wild-type HD-73 sequence encoding amino acids 61 6 to the C-terminus of the native protein. Figure 9 shows a com- 
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parison of this synthetic/wild-type full length HD-73 gene versus the wild-type full length HD-73 gene. 
[0084] Another full length B.tk. gene consists of the synthetic HD-73 sequence of Figure 4 from nucleotide 1 -1 845 
plus a modified HD-73 sequence ending amino acids 61 6 to the C-terminus of the native protein. The C-terminal portion 
has been modified by site-directed mutagenesis to remove putative polyadenylation signals and ATTTA sequences 
5 according to the algorithm of Figure 1. Figure 10 shows a comparison of this synthetic/modified full length HD-73 gen 
versus the wild-type full length HD-73 gene. 

[0085] Another full length B.tk. gene consists of a fully synthetic HD-73 sequence which incorporates the synthetic 
HD-73 sequence of Figure 4 from nucleotide 1-1845 plus a synthetic sequence encoding amino acids 616 to the C- 
terminus of the native protein. The C-terminal synthetic portion has been designed to eliminate putative polyadenylation 
10 signals and ATTTA sequences and to include plant preferred codons. Figure 1 1 shows a comparison of this fully syn- 
thetic full length HD-73 gene versus the wild-type full length HD-73 gene. 

[0086] Alternatively, another full length B.tk. gene consists of a fully synthetic sequence comprising base pairs 
1-1830 of B.tk. HD-1 (Figure 3) and base pairs 1834-3534 of B.tk. HD-73 (Figure 11). 

15 Example 4 -- Expression of Modified and Synthetic B.tk. HD-1 and Synthetic HD-73 

[0087] A number of plant transformation vectors for the expression of B.tk. genes were constructed by incorporating 
the structural coding sequences of the previously described genes into plant transformation cassette vector pMON893. 
The respective intermediate transformation vector is inserted into a suitable disarmed Agrobacterium vector such as 
20 a turnefaciens ACO, supra. Tissue explants are cocultured with the disarmed Agrobacterium vector and plants regen- 
erated under selection for kanamycin resistance using known protocols: tobacco (Horsch et al., 1985); tomato (Mc- 
Cormick et al., 1986) and cotton (Trolinder et al., 1987). 

a) Tobacco. 

25 

[0088] The level of B.tk. HD-1 protein in transgenic tobacco plants containing pMON9921 (wild type truncated), 
pMON5370 (modified HD-1 , Example 1 , Figure 2) and pMON5377 (synthetic HD-1 , Examp!e2, Figure 3) were analyzed 
by Western analysis. Leaf tissue was frozen in liquid nitrogen, ground to a fine powder and then ground in a 1 :2 (wt: 
volume) of SDS-PAGE sample buffer. Samples were frozen on dry ice, then incubated for 1 0 minutes in a boiling water 

30 bath and microfuged for 1 0 minutes. The protein concentration of the supernatant was determined by the method of 
Bradford (Anal. Biochem. 72:248-254). Fifty ug of protein was run per lane on 9% SDS-PAGE gels, the protein trans- 
ferred to nitrocellulose and the B.tk. HD-1 protein visualized using antibodies produced against B.tk. HD-1 protein as 
the primary antibody and alkaline phosphatase conjugated second antibody as described by the manufacturer (Prome- 
ga, Madison, Wl). Purified HD-1 tryptic fragment was used as the control. Whereas the B.tk. protein from tobacco 

35 plants containing pMON9921 was below the level of detection, the B.tk. protein from plants containing the modified 
(pMON5370) and synthetic (pMON5377) genes was easily detected. The B.tk. protein from plants containing 
pMON9921 remained undetectable, even with 1 0 fold longer incubation times. The relative levels of B.tk. HD-1 protein 
in these plants is estimated in Table VIII. Because the protein from plants containing pMON9921 was not observed, 
the level of protein in these plants was estimated from the relative mRNA levels (see below). Plants containing the 

40 modified gene (pMON5370) expressed approximately 1 00 fold more B.tk. protein than plants containing the wild-type 
gene (pMON9921). Plants containing the fully synthetic B.tk. HD-1 gene (pMON5377) expressed approximately five 
fold more protein than plants containing the modified gene. The modified gene contributes the majority of the increase 
in B.tk. expression observed. The plants used to generate the above data are the best representatives from each 
construct based either on a tobacco hornworm bioassay or on data derived from previous Western analysis. 

45 



Table VIII 



Expression of B.tk. HD-1 Protein in Transgenic Tobacco 


Gene Description 


Vector 


B.tk. Protein* Concentration 


Fold Increase in B.tk 






Expression 


Wild type 


pMON9921 


10 


1 


Modified . 


pMON5370 


1000 


100 


Synthetic 


pMON5377 


5000 


500 



* B.tk. protein concentrations are expressed In no/mg of total soluble protein. The level of B.tk. protein for plants containing the wild type gene are 
estimated from mRNA levels. 



[0089] Plants containing these genes wer test d for bioactivity to determine wh th r th increased quantities of 
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protein observed by Western analysis result in a corresponding increase in bioactivity. Leaves from the sam plants 
used for the Western data in Table 1 were tested for bioactivity against two insects. A detached leaf bioassay was first 
done using tobacco hornworm, an extremely sensitive lepidopteran insect. Leaves from ail three transgenic tobacco 
plants were totally protected and 100% mortality of tobacco hornworm observed (see Table IX below). A much less 

5 sensitive insect, beet armyworm, was then used in another detached leaf bioassay. Beet armyworm is approximately 
500 fold less sensitive to B.tk. HD-1 protein than tobacco hornworm. The differ nee in sensitivity of these two insects 
was determined using purified HD-1 protein in a diet incorporation assay (see below). Plants containing the wild-type 
gene (pMON9921) showed only minimal protection against beet armyworm, whereas plants containing the modified 
gene showed almost complete protection and plants containing the fully synthetic gene were totally protected against 

10 beet armyworm damage. The results of these bioassays confirm the levels of B.tk. HD-1 expression observed in the 
Western analysis and demonstrates that the increased levels of B.tk. HD-1 protein correlates with increased insecticidal 
activity. 



Table IX 



Protection of Tobacco Plants from Tobacco Hornworm and Beet Armyworm 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Beet Armyworm Damage* 


None 


None 


NL 


NL 


Wild type 


pMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON5377 


0 


0 



* Extent of insect damage was rated: 0, no damage; 1 , slight; 2, moderate; 3, severe; or NL, no leaf left. 



25 [0090] The bioactivity of the B.tk. HD-1 protein produced by these trans genie pi ants was further investigated to mo re 
accurately quantitate the relative activities. Leaf tissue from tobacco plants containing the wild-type, modified and 
synthetic genes were ground in 100 mM sodium carbonate buffer, pH 10 at a 1:2 (wt:vol) ratio. Particulate material 
was removed by centrifugation. The supernatant was incorporated into a synthetic diet similar to that described by 
Marrone et al. (1 985). The diet medium was prepared the day of the test with the plant extract solutions incorporated 

30 in place of the 20% water component. One ml of the diet was aliquoted into 96 well plates. 

[0091] After the diet dried, one neonate tobacco budworm larva was added to each well. Sixteen insects were tested 
with each plant sample. The plants were incubated at 27°C. After seven days, the larvae from each treatment were 
combined and weighed on an analytical balance. The average weight per insect was calculated and compared to a 
standard curve relating B.tk. protein concentrations to average larval weight. Insect weight was inversely proportional 

35 (in a logarithmic manner) to the relative increase in B.tk. protein concentration. The amount of B.tk. HD-1 protein, 
based on the extent of larval growth inhibition was determined for two different plants containing each of the three 
genes. The specific activity (ng of B.tk. HD-1 per mg of plant protein) was determined for each plant Plants containing 
the modified HD-1 gene (pMON5370) averaged approximately 1400 ng (1200 and 1600 ng) of B.tk. HD-1 per mg of 
plant extract protein. This value compares closely with the 1 000 ng of B.tk. HD-1 protein per mg of plant extract protein 

40 as determined by Western analysis (Table I). B.tk. HD-1 concentrations for the plants containing the synthetic HD-1 
gene averaged approximately 8200 ng (7200 and 9200 ng) of B.tk. HD-1 protein per mg of plant extract protein. This 
number compares well to the 5000 ng of HD-1 protein per mg of plant extract protein estimated by Western analysis. 
Likewise, plants containing the synthetic gene showed approximately a six-fold higher specific activity than the corre- 
sponding plants containing the modified gene for these bioassays. In the Western analysis the ratio was approximately 

45 10 fold, again both are in good agreement. The level of B.tk. protein in plants containing the wild-type HD-1 gene 
(pMON9921) was too low to give a significant decrease in larval weight and hence was below a level that could be 
quantitated in this assay. In conclusion, the levels of B.tk. HD-1 protein determined by both the bioassays and the 
Western analysis for these plants containing the modified and synthetic genes agree, which demonstrates that the B. 
tk. HD-1 protein produced by these plants is biologically active. 

so [0092] The levels of mRNA were determined in the plants containing the wild-type B.tk. HD-1 gene (pMON9921) 
and the modified gene (pMON5370) to establish whether the increased levels of protein production result from in- 
creased transcription or translation. mRNA from plants containing the synthetic gene could not be analyzed directly 
with the same DNA probe as used for the wild-type and modified genes because of the numerous changes made in 
the coding sequence. mRNA was isolated and hybridized with a single-stranded DNA probe homologous to approxi- 

55 mately the 5' 90 bp of the wild-type or modified gene coding sequences. The hybrids were digested with S1 nuclease 
and the protected probe fragments analyzed by gel electrophoresis. Because the procedure used a large excess of 
probe and long hybridization time, the amount of protected prob is proportional to the amount of B.tk. mRNA present 
in the sample. Two plants expressing the modified gene (pMON5370) were found to produc up to ten-fold mor RNA 
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than a plant expressing the wild-type gen (pMON9921 ). 

[0093] The increased mRNA level from the modified gene is consistent with the result xpected from the modifications 
introduced into this gene. However, this 1 0 fold increase in mRNA with the modified gen compared to the wild-type 
gene is in contrast to the 1 00 fold increase in B.tk. protein from these genes in tobacco plants. If the two mRNAs were 

s equally well translated then a 1 0 fold increase in stable mRNA would be expected to yield a 1 0 fold increase in prot in. 
The higher increase in protein indicates that the modified gene mRNA is translated at about a 10 fold higher efficiency 
than wild-type. Thus, about half of the total effect on gene expression can be explained by changes in mRNA levels 
and about half to changes in translational efficiency. This increase in translational efficiency is striking in that only about 
9,5% of the codons have been changed in the modified gene; that is, this effect is clearly not due to wholesale codon 

10 usage changes. The increased translational efficiency could be due to changes in mRNA secondary structure that 
affect translation or to the removal of specific translational blockades due to specific codons that were changed. 
[0094] The increased expression seen with the synthetic HD-1 gene was also seen with a synthetic HD-73 gene in 
tobacco. B.tk. HD-73 was undetected in extracts of tobacco plants containing the wild-type truncated HD-73 gene 
(pMON5367) , whereas B.tk. HD-73 protein was easily detected in extracts from tobacco plants containing the synthetic 

15 HD-73 gene of Figure 4 (pMON5383). Approximately 1000 ng of B.tk. HD-73 protein was detected per mg of total 
soluble plant protein. 

[0095] As described in Example 3 above, the B.tk. HD-73 protein encoded in pMON5383 contains a small C-terminal 
extension of amino acids not encoded in the wild-type HD-73 protein. These extra amino acids had no effect on insect 
toxicity or on increased plant expression. A second synthetic HD-73 gene was constructed as described in Example 

20 3 (Figure 8) and used to transform tobacco (pMON5390). Analysis of plants containing pMON5390 showed that this 
gene was expressed at levels comparable to that of pMON5383 and that these plants had similar insecticidal efficacy. 
[0096] In tobacco plants the synthetic HD-1 gene was expressed at approximately a 5-fold higher level than the 
synthetic HD-73 gene. However, this synthetic HD-73 gene still was expressed at least 100-fold better than the wild- 
type HD-73 gene. The HD-73 protein is approximately 5-fold more toxic to many insect pests than the HD-1 protein, 

25 so both synthetic HD-1 and HD-73 genes provide approximately comparable insecticidal efficacy in tobacco. 

[0097] The full length B.tk. HD-73 genes described in Example 3 were also incorporated into the plant transformation 
vector pMON893 so that they were expressed from the En 35S promoter. The synthetic/wild-type full length HD-73 
gene of Figure 9 was incorporated into pMON893 to create pMON10505. The synthetic/modified full length HD-73 
gene of Figure 10 was incorporated into pMON893 to create pMON10526. The fully synthetic HD-73 gene of Figure 

30 11 was incorporated into pMON893 to create pMON10518. These vectors were used to obtain transformed tobacco 
plants, and the plants were analyzed for insecticidal efficacy and for B.tk. HD-73 protein levels by Western blot or 
ELISA immunoassay. 

[0098] Tobacco plants containing ail three of these full length B.tk. genes produced detectable B.tk. protein and 
showed 100% mortality of tobacco hornworm. This result is surprising in light of previous reported attempts to express 

35 the full length B.t.k. genes in transgenic plants. Vaeck et al. (1987) reported that a full length B.tk. berlinergene similar 
to our HD-1 gene could not be detectably expressed in tobacco. Barton et al. (1 987) reported a similar result for another 
full length gene from B.tk. HD-1 (the so called 4.5 kb gene), and further indicated that tobacco callus containing this 
gene became necrotic, indicating that the full length gene product was toxic to plant cells. Fischhoff etal. (1 987) reported 
that the full length B.tk HD-1 gene in tomato was poorly expressed compared to a truncated gene, and no plants that 

40 were fully toxic to tobacco hornworm could be recovered. All three of the above reports indicated much higher expres- 
sion levels and recovery of toxic plants if the respective B.tk. genes were truncated. Adang et al. reported that the full 
length HD-73 gene yielded a few tobacco plants with some biological activity (none were highly toxic) against hornworm 
and barely detectable B.tk protein. It was also noted by them that the major B.tk. mRNA in these plants was a truncated 
1 .7 kb species that would not encode a functional toxin. This indicated improper expression of the gene in tobacco. In 

45 contrast to all of these reports, the three full length B.t.k. HD-73 genes described above all lead to relatively high levels 
of protein and high levels of insect toxicity. 

[0099] B.tk. protein and mRNA levels in tobacco plants are shown in Table X for these three vectors. As can be 
seen from the table, the synthetic/wild-type gene (pMON 1 0506) produces B.tk. protein as about 0.01 % of total soluble 
protein; the synthetic/modified gene produces B.tk. as about 0.02% of total soluble protein; and the fully synthetic 

so gene produces B.tk. as about 0.2% of total soluble protein. B.tk. mRNA was analyzed in these plants by Northern 
blot analysis using the common 5' synthetic half of the genes as a probe. As shown in Table X, the increased protein 
levels can largely be attributed to increased mRNA levels. Compared to the truncated modified and synthetic genes, 
this could indicate that the major contributors to increased translational efficiency are in the 5' half of the gene while 
the 3' half of the gene contains mostly determinants of mRNA stability. The increased protein levels also indicate that 

ss Increasing the amount of the full length gene that is synthetic or modified increases B.tk. protein levels. Compared to 
th truncated synth tic B.tk. HD-73 genes (pMON5383 or pMON5390), the fully synthetic gene (pMON10518) pro- 
duces as much or slightly more B.tk. protein d monstrating that the full length g nes are capable of b ingexpr ssed 
at high levels in plants. These tobacco plants with high levels of full length HD-73 prot in show no evidence of abnor- 
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mality and are fully fertile. The B.tk. protein levels in these plants also produce the expected levels of insect toxicity 
based on feeding studies with beet armyworm or diet incorporation assays of plant extracts with tobacco budworm. 
The B.tk. protein detected by Western blot analysis in these tobacco plants often contains a varying amount of protein 
of about 80 kDa which is apparently a proteolytic fragment of the f u II length protein . The C-terminal half of the full length 
protein is known to be proteolytically sensitive, and similar proteolytic fragments are seen from the full length gene in 
E. coti and B.t. itself. These fragments are fully insecticidal. The Northern analysis indicated that ssentially all of the 
mRNA from these full length genes was of the expected full length size. There is no vidence of truncated mRNAs that 
could give rise to the 80 kDa protein fragment, in addition, it is possible that the fragment is not present in intact plant 
cells and is merely due to proteolysis during extraction for immunoassay. 



Table X 



Full Length B.tk. HD-73 Protein and mRNA Levels in Transgenic Tobacco Plants 


Gene description 


Vector 


B.t.k. protein concentration 


Relative B.t.k. mRNA level 


Synthetic/wild type 
Synthetic/modified 
Fully synthetic 


pMON10506 
pMON 10526 
pMON10518 


>100 

400 

>2000 


0.5 
1 

40 



[0100] Thus, there is no serious impediment to producing high levels of B.tk. HD-73 protein in plants from synthetic 
genes, and this is expected to be true of other full length lepidopteran active genes such as B.tk. HD-1 or B.t ento- 
mocidus. The fully synthetic B.t.k. HD-1 gene of Example 3 has been assembled in plant transformation vectors such 
as pMON893. 

[0101] The fully synthetic gene in pMON10518 was also utilized in another plant vector and analyzed in tobacco 
plants. Although the CaMV35S promoter is generally a high level constitutive promoter in most plant tissues, the ex- 
pression level of genes driven the CaMV35S promoter is low in floral tissue relative to the levels seen in leaf tissue. 
Because the economically important targets damaged by some insects are the floral parts or derived from floral parts 
(e.g., cotton squares and bolls, tobacco buds, tomato buds and fruit), it may be advantageous to increase the expression 
of B.t. protein in these tissues over that obtained with the CaMV35S promoter. 

[01 02] The 35S promoter of Figwort Mosaic Virus (FM V) is analogous to the CaM V35S promoter. This promoter has 
been isolated and engineered into a plant transformation vector analogous to pMON893. Relative to the CaMV pro- 
moter, the FMV 35S promoter is highly expressed in the floral tissue, while still providing similar high levels of gene 
expression in other tissues such as leaf. A plant transformation vector, pMON1 051 7, was constructed in which the full 
length synthetic B.tk. HD-73 gene of Figure 11 was driven by the FMV 35S promoter. This vector is identical to 
pMON1 0518 of Example 3 except that the FMV promoter is substituted for the CaMV promoter. Tobacco plants trans- 
formed with pMON 1 051 7 and pMON 1 051 8 were obtained and compared for expression of the B.t.k. protein by Western 
blot or ELISA immunoassay in leaf and floral tissue. This analysis showed that pMON1 0517 containing the FMV pro- 
moter expressed the full length HD-73 protein at higher levels in floral tissue than pMON1 051 8 containing the CaMV 
promoter. Expression of the full length B.tk. HD-73 protein from pMON1 051 7 in leaf tissue is comparable to that seen 
with the most highly expressing plants containing pMON10518. However, when floral tissue was analyzed, tobacco 
plants containing pMON10518 that had high levels of B.tk. protein in leaf tissue did not have detectable B.tk. protein 
in the flowers. On the other hand, flowers of tobacco plants containing pMON1 051 7 had levels of B.tk. protein nearly 
as high as the levels in leaves at approximately 0.05% of total soluble protein. This analysis showed that the FMV 
promoter could be used to produce relatively high levels of B.tk. protein in floral tissue compared to the CaMV promoter. 

b) Tomato. 

[0103] The wild-type, modified and synthetic B.tk. HD-1 genes tested in tobacco were introduced into other plants 
to demonstrate the broad utility of this invention. Transgenic tomatoes were produced which contain these three genes. 
Data show that the increased expression observed with the modified and synthetic gene in tobacco also extends to 
tomato. Whereas the B.tk. HD-1 protein is only barely detectable in plants containing the wild type HD-1 gene 
(pMON9921), B.tk. HD-1 was readily detected and the levels determined for plants containing the modified 
(pMON5370) or synthetic (pMON5377) genes. Expression levels for the plants containing the wild-type, modified and 
synthetic HD-1 genes were approximately 1 0, 100 and 500 ng per mg of total plant extract see Table XI below). The 
increase in B.tk. HD-1 protein for the modified gene accounted for the majority of increase observed; 10 fold higher 
than the plants containing the wild-type gene, compared to only an additional five-fold increase for plants containing 
th synthetic gene. Again the sit -direct d changes made In the modified g ne are the major contributors to th in- 
creased xpr ssion of B.tk. HD-1 . 
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Table XI 



B.tk. HD-1 Expression in Transgenic Tomato Plants 


Gene Description 


Vector 


B.tk. Protein* Concentration 


Fold Increase in B.tk. 








Expression 


Wild type 


pMON9921 


10 


1 


Modified 


pMON5370 


100 


10 


Synthetic 


pMON5377 


500 


50 



* B.tk. HD-1 protein concentrations are expressed In ng/mg of total soluble plant protein. Oata for plants containing the wild-type gene are estimates 
from mRNA levels and protein levels determined by EUSA. 



[0104] These differences in B.tk. HD-1 expression were confirmed with bioassays against tobacco hornworm and 
beet armyworm. Leaves from tomato plants containing each of these genes controlled tobacco hornworm damage and 
produced 100% mortality. With beet armyworm, leaves from plants containing the wild-type HD-1 gene (pMON9921) 
showed significant damage, leaves from plants containing the modified gene (pMON5370) showed less damage and 
leaves from plants containing the synthetic gene (pMON5377) were completely protected (see Table XII below). 



Table XII 



Protection of Tomato Plants from Tobacco Hornworm and Beet Armyworm 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Beet Armyworm Damage* 


None 


None 


NL 


NL 


Wild type 


pMON9921 


0 


3 


Modified 


pMON5370 


0 


1 


Synthetic 


pMON5377 


0 


0 



* Damage was rated as shown in Table IX. 



[0105] The generality of the synthetic gene approach was extended in tomato with a synthetic B.tk. HD-73 gene. 
[0106] In tomato, extracts from plants containing the wild-type truncated HD-73 gene (pMON5367) showed no de- 
tectable HD-73 protein. Extracts from plants containing the synthetic HD-73 gene (pMON5383) showed high levels of 
B.tk. HD-73 protein, approximately 2000 ng per mg of plant extract protein. These data clearly demonstrate that the 
changes made in the synthetic HD-73 gene lead to dramatic increases in the expression of the HD-73 protein in tomato 
as well as in tobacco 

[0107] In contrast to tobacco, the synthetic HD-73 gene in tomato is expressed at approximately 4-fold to 5-fold 
higher levels than the synthetic HD-1 gene. Because the HD-73 protein is about 5-fold more active than the HD-1 
protein against many insect pests including Heliothis species, the increased expression of synthetic HD-73 compared 
to synthetic HD-1 corresponds to about a 25-fold increased insecticidal efficacy in tomato. 

[0108] In order to determine the mechanisms involved in the increased expression of modified and synthetic B.tk. 
HD-1 genes in tomato, S1 nuclease analysis of mRNA levels from transformed tomato plants was performed. As 
indicated above, a similar analysis had been performed with tobacco plants, and this analysis showed that the modified 
gene produced up to 1 0-fold more mRNA than the wild-type gene. The analysis in tomato utilized a different DNA probe 
that allowed the analysis of wild-type (pMON9921), modified (pMON5370) and synthetic (pMON5377) HD-1 genes 
with the same probe. This probe was derived from the 5' untranslated region of the CaMV35S promoter in pMON893 
that was common to all three of these vectors (pMON9921 , pMON5370 and pMON5377). This S1 analysis indicated 
that B.tk. mRNA levels from the modified gene were 3 to 5 fold higher than for the wild-type gene, and that mRNA 
levels for the synthetic gene were about 2 to 3 fold higher than for the modified gene. Three independent transformants 
were analyzed for each gene. Compared to the fold increases in B.tk. HD-1 protein from these genes in tomato shown 
in Table XI, these mRNA increases can explain about half of the total protein increase as was seen in tobacco for the 
wild-type and modified genes. For tomato the total mRNA increase from wild-type to synthetic is about 6 to 15 fold 
compared to a protein increase of about 50 fold. This result is similar to that seen for tobacco in comparing the wild- 
type and modified genes, and it extends to the synthetic gene as well. That is, about half of the total fold increase in 
B.tk. protein from wild-type to modified genes can be explained by mRNA increases and about half to enhanced 
translational efficiency. The same is also true in comparing the modified gene to the synthetic gene. Although there is 
an additional increase in RNA levels, this mRNA increase can explain only about half of the total prot in increase. 
[01 09] The full length B.tk. genes described above were also used to transform tomato plants and th se plants w re 
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analyzed for B.tk. protein and insecticidal efficacy. The results of this analysis areshown in Table XIII. Plants containing 
the synthetic/wild-type gene (pMON10506) produce th B.tk. HD-73 protein at I v Is of about 0:01% of their total 
soluble protein. Plants containing the synthetic/modified gene (pMON 10526) produce about 0.04% B.tk. protein, and 
plants containing the fully synthetic gene (pMON10518) produce about 0.2% B.tk. protein. These results are very 

5 similar to the tobacco plant results for the same genes. mRNA levels estimated by Northern blot analysis in tomato 
also increase in parallel with the protein level increase. As for tobacco with these three genes, most of the protein 
increase can be attributed to increased mRNA with a small component of translational fficiency increase indicated 
for the fully synthetic gene. The highest levels of full length B.tk. protein (from pMON1 051 8) are comparable to or just 
slightly lower than the highest levels observed for the truncated HD-73 genes (pMON5383 and pMON5390). Tomato 

10 plants expressing these full length genes have the insecticidal activity expected for the observed protein levels as 
determined by feeding assays with beet armyworm or by diet incorporation of plant extracts with tobacco hornworm. 



Table XIII 



Full Length B.tk. HD-73 Protein and mRNA Levels in Transgenic Tomato Plants 


Gene description 


Vector 


B.tk. protein concentration 


Relative B.tk. mRNA level 


Synthetic/wild type 


PMON10506- 


100 


1 


Synthetic/modified 


pMON10526 


400 


2-4 


Fully synthetic 


pMON10518 


2000 


10 



c) Cotton. 

[0110] The generality of the increased expression of B.tk. HD-1 and B.tk. HD-73 by use of the modified and synthetic 
genes was extended to cotton. Transgenic calli were produced which contain the wild type (pMON9921 ) and the syn- 
thetic HD-1 (pMON5377) genes. Here again the B.tk. HD-1 protein produced from calli containing the wild-type gene 
was not detected, whereas calli containing the synthetic HD-1 gene expressed the HD-1 protein at easily detectable 
levels. The HD-1 protein was produced at approximately 1000 ng/mg of plant calli extract protein. Again, to ensure 
that the protein produced by the transgenic cotton calli was biologically active and that the increased expression ob- 
served with the synthetic gene translated to increased biological activity, extracts of cotton calli were made in similar 
manner as described for tobacco plants, except that the calli was first dried between Whatman filter paper to remove 
as much of the water as possible. The dried calli were then ground in liquid nitrogen and ground in 100 mM sodium 
carbonate buffer, pH 10. Approximately 0.5 ml aliquotes of this material was applied to tomato leaves with a paint 
brush. After the leaf dried, five tobacco hornworm larvae were applied to each of two leaf samples. Leaves painted 
with extract from control calli were completely destroyed. Leaves painted with extract from calli containing the wild- 
type HD-1 gene (pMON9921) showed severe damage. Leaves painted with extract from calli containing the synthetic 
HD-1 gene (pMON5377) showed no damage (see Table XIV below). 



Table XIV 



Protection against Tobacco Hornworm by Tomato Leaves Painted with Extracts Prepared from Cotton Calli 


Containing a Control, the Wild-Type B.tk. HD-1 Gene. Synthetic HD-1 Gene or Synthetic HD-73 Gene 


Gene Description 


Vector 


Tobacco Hornworm Damage* 


Control 


Control 


NL 


Wild type HD-1 


pMON9921 


3 


Synthetic HD-1 


pMON5377 


0 


Synthetic HD-73 


pMON5383 


0 



* Damage was rated as shown in Table IX. 



[011 1J Cotton calli were also produced containing another synthetic gene, a gene encoding B.tk. HD-73. The prep- 
aration of this gene is described in Example 3. Calli containing the synthetic HD-73 gene produced the corresponding 
HD-73 protein at even higher levels than the calli which contained the synthetic HD-1 gene. Extracts made from calli 
containing the HD-73 synthetic gene (pMON5383) showed complete control of tobacco hornworm when painted onto 
tomato leaves as described above for extracts containing the HD-1 protein. (See Table XIV). 
[0112] Transgenic cotton plants containing the synthetic B.tX HD-1 gene (pMON5377) or the synthetic B.tk: HD- 
73 gene (pMON5383) hav also been xamin d. These plants produce the HD-1 or HD-73 proteins at lev Is compa- 
rabl to that seen in cotton callus with the sam genes and comparable to tomato and tobacco plants with th se genes. 
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For either synthetic truncated HD-1 or HD-73 genes, cotton plants expressing B.tk. protein at 1000 to 2000 ng/mg 
total protein (0.1% to 0.2%) were recovered at a high frequency. Insect feeding assays were performed with leaves 
from cotton plants expressing the synthetic HD-1 or HD-73 genes. These leaves showed no damage (rating of 0) when 
challenged with larvae of cabbage looper (Trichoplusia ni), and only slight damage when challenged with larvae of 
s beet armyworm (Spodoptera exigua). Damage ratings are as defined in Table IX above. This demonstrated that cotton 
plants as well as calli expressed the synthetic HD-1 or HD-73 genes at high levels and that those plants were protected 
from damage by Lepidopteran insect larvae. 

[0113] Transgenic cotton plants containing either the synthetic truncated HD-1 gene (pMON5377) or the synthetic 
truncated HD-73 gene (pMON5383) were also assessed for protection against cotton bollworm at the whole plant level 

10 jn the greenhouse. This is a more realistic test of the ability of these plants to produce an agriculturally acceptable level 
of control. The cotton bollworm (Heliothis zea) is a major pest of cotton that produces economic damage by destroying 
terminals, squares and bolls, and protection of these fruiting bodies as well as the leaf tissue will be important for 
effective insect control and adequate crop protection. To test the protection afforded to whole plants, R1 progeny of 
cotton plants expressing high levels of either B.tk. HD-1 (pMON5377) or B.tk. HD-73 (pMON5383) were assayed by 

'5 applying 1 0-1 5 eggs of cotton bollworm per boll or square to the 20 uppermost squares or bolls on each plant. At least 
12 plants were analyzed per treatment. The hatch rate of the eggs was approximately 70%. This corresponds to very 
high insect pressure compared to numbers of larvae per plant seen under typical field conditions. Under these condi- 
tions 1 00% of the bolls on control cotton plants were destroyed by insect damage. For the transgenics, significant boll 
protection was observed. Plants containing pMON5377 (HD-1) had 70-75% of the bolls survive the intense pressure 

20 of this assay. Plants containing pMON5383 (HD-73) had 80% to 90% boll protection. This is likely to be a consequence 
of the higher activity of HD-73 protein against cotton bollworm compared to HD-1 protein. Incases where the transgenic 
plants were damaged by the insects, the surviving larvae were delayed in their development by at least one instar. 
[01 14] Therefore, the increased expression obtained with the modified and synthetic genes is not limited to any one 
crop; tobacco, tomato and cotton calli and cotton plants all showed drastic increases in B.tk. expression when the 

25 plants/calli were produced containing the modified or synthetic genes. Likewise, the utility of changes made to produce 
the modified and synthetic B.tk. HD-1 gene is not limited to the HD-1 gene. The synthetic HD-73 gene in all three 
species also showed drastic increases in expression. 

[01 15] In summary, it has been demonstrated that: (1 ) the genetic changes made in the HD-1 modified gene lead to 
very significant increases in B.tk. HD-1 expression; (2) production of a totally synthetic gene lead to a further five-fold 

30 increase in B.tk. HD-1 expression; (3) the changes incorporated into the modified HD-1 gene accounted forthe majority 
of the increased B.tk. expression observed with the synthetic gene; (4) the increased expression was demonstrated 
in three different plants -- tobacco plants, tomato plants and cotton calli and cotton plants; (5) the increased expression 
as observed by Western analysis also correlated with similar increases in bioactivity, showing that the B.tk. HD-1 
proteins produced were comparably active; (6) when the method of the present invention used to design the synthetic 

35 HD-1 gene was employed to design a synthetic HD-73 gene it also was expressed at much higher levels in tobacco, 
tomato and cotton than the wild-type equivalent gene with consequent increases in bioactivity; (7) a fully synthetic full 
length B.tk. gene was expressed at levels comparable to synthetic truncated genes. 

Example 5 -- Synthetic B.t tenebrionis Gene in Tobacco. Tomato and Potato 

40 

[0116] Referring to Figure 12, a synthetic gene encoding a Coleopteran active toxin is prepared by making the indi- 
cated changes in the wild-type gene of B.t tenebrionis or de novo synthesis of the synthetic structural gene. The 
synthetic gene is inserted into an intermediate plant transformation vector such as pMON893: Plasmid pMON893 
containing the synthetic B.tt gene is then inserted into a suitable disarmed Agrobacterium strain such as A. tumefaciens 
45 ACO. 

Transformation and Regeneration of Potato 

[0117] Sterile shoot cultures of Russet Burbank are maintained in vials containing 10 ml of PM medium (Murashige 
so and Skoog (MS) inorganic salts, 30 g/l surcose, 0.17 g/l NaHjPO^O, 0.4 mg/l thiamine-HCI, and 100 mg/l myo- 
inositol, solidified with 1 g/l Gelrite at pH 6.0). When shoots reached approximately 5 cm in length, stem internode 
segments of 7-10 mm are excised and smeared at the cut ends with a disarmed Agrobacterium tumefaciens vector 
containing the synthetic B.tt gene from a four day old plate culture. The stem explants are co-cultured for three days 
at 23°c on a sterile filter paper placed over 1.5 ml of a tobacco cell feeder layer overlaid on 1/10 P medium (1/10 
ss strength MS inorganic salts and organic addenda without casein as in Jarret et al. (1980), 30 g/l surcose and 8.0 g/l 
agar). Following co-culture th explants ar transferr d to full strength P-1 medium for callus induction, composed of 
MS inorganic salts, organic additions as in Jarret t al. (1980) with the exception of casein, 3.0 mg/l benzyladenine 
(BA), and 0.01 mg/l naphthaleneacetic acid (NAA) (Jarret, et al., 1980). Carb nicillin (500 mg/l) is included to inhibit 
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bacterial growth, and 100 mg/l kanamycin is added to select for transformed c lis. Aft r four weeks the explants are 
transferred to medium of the same composition but with 0.3 mg/l gibberellic acid (GA3) replacing the BA and NAA 
(Jarret et al., 1981) to promote shoot formation. Shoots begin to develop approximately two weeks after transfer to 
shoot induction medium; these are excised and transferred to vials of PM medium for rooting. Shoots are tested for 
5 kanamycin resistance conferred by the enzyme neomycin phosphotransferase II, by placing a section of the stem onto 
callus induction medium containing MS organic and inorganic salts, 30 g/l surcrose, 2.25 mg/l BA, 0.186 mg/l NAA, 
10 mg/l GA3 (Webb, et al., 1983) and 200 mg/l kanamycin to select for transformed cells. 

[0118] The synthetic BAA. gene described in figure 12, was placed into a plant expression vector as descibed in 
example 5. The plasmid has the following characteristics; a synthetic Bglll fragment having approximately 1800 base 

10 pairs was inserted into pMON893 in such a manner that the enhanced 35S promoter would express the BAA. gene. 
This construct, pMON1 982, was used to transform both tobacco and tomato. Tobacco plants, selected as kanamycin 
resistant plants were screened with rabbit anti-B.tt. antibody. Cross-reactive material was detected at levels predicted 
to be suitable to cause mortality to CPB. These target insects will not feed on tobacco, but the transgenic tobacco 
plants do demonstrate that the synthetic gene does improve expression of this protein to detectable levels. 

15 [0119] Tomato plants with the pMONM 982 construct were determined to produce BAA. protein at levels insecticidal 
to CPB. In initial studies, the leaves of four plants (5190, 5225, 5328 and 5133) showed little or no damage when 
exposed to CPB larvae (damage rating of 0-1 on a scale of 0 to 4 with 4 as no leaf remaining). Under these conditions 
the control leaves were completely eaten. Immunological analysis of these plants confirmed the presence of material 
cross-reactive with anti-B.f.f. antibody. Levels of protein expression in these plants were estimated at aproximately 1 

20 to 5 ng of BAA. protein in 50 ug of total extractabie protein. A total of 17 tomato plants (17 of 65 tested) have been 
identified which demonstrate protection of leaf tissue from CPB (rating of 0 or 1) and show good insect mortality. 
[0120] Results similar to those seen in tobacco and tomato with pMON1 982 were seen with pMONM 984 in the same 
plant species. pMON1984 is identical to pMON1982 except that the synthetic protease inhibitor (CMTI) is fused up- 
stream of the native proteolytic cleavage site. Levels of expression in tobacco were estimated to be similar to 

25 pMON1 982, between 10-15 ng per 50ug of total soluble protein. 

[0121] Tomato plants expressing pMON1984 have been identified which protect the leaves from ingestion by CPB. 
The damage rating was 0 with 100% insect mortality. 

[01 22] Potato was transformed as described in example 5 with a vector similar to pMON1 982 containing the enhanced 
CaMV35S/synthetic BAA. gene. Leaves of potato plants transformed with this vector, were screened by CPB insect 

30 bioassay. Of the 35 plants tested, leaves from 4 plants, 1 6a, 1 3c, 13d, and 23a were totally protected when challenged. 
Insect bioassays with leaves from three other plants, 1 3e, la, and 1 3b, recorded damage levels of 1 on a scale of 0 to 
4 with 4 being total devestation of the leaf material. Immunological analysis confirmed the presence of B.tt cross- 
reactive material in the leaf tissue. The level of B.tt protein in leaf tissue of plant 1 6a (damage rating of 0) was estimated 
at 20-50 ng of BAA. protein/50 ug of total soluble protein. The levels of BAA. protein seen in 16a tissue was consistent 

35 with its biological activity. Immunological analysis of 13e and 13b (tissue which scored 1 in damage rating) reveal less 
protein (5-1 0 ng/50 ug of total soluble protein) than in plant 1 6a. Cuttings of plant 1 6a were challenged with 50 to 200 
eggs of CPB in a whole plant assay. Under these conditions 16a showed no damage and 100% mortality of insects 
while control potato plants were heavily damaged. 

40 Example 6 Synthetic BA.k P2 Protein Gene 

[0123] The P2 protein is a distinct insecticidal protein produced by some strains of BA. including BA.k. HD-1. It is 
characterized by its activity against both lepidopteran and dipteran insects (Yamamoto and lizuka, 1983). Genes en- 
coding the P2 protein have been isolated and characterized (Donovan et al., 1988). The P2 proteins encoded by these 

45 genes are approximately 600 amino acids in length. These proteins share only limited homology with the lepidopteran 
specific P1 type proteins, such as the BA.k. HD-1 and HD-73 proteins described in previous examples. 
[0124] The P2 proteins have substantial activity against a variety of lepidopteran larvae including cabbage looper, 
tobacco hornworm and tobacco budworm. Because they are active against agronomically important insect pests, the 
P2 proteins are a desirable candidate in the production of insect tolerant transgenic plants either alone or in combination 

so with the other B.t toxins described in the above examples. In some plants, expression of the P2 protein alone might 
be sufficients provide protection against damaging insects. In addition, the P2 proteins might provide protection against 
agronomically important dipteran pests. In other cases, expression of P2 together with the BA.k. HD-1 or HD-73 protein 
might be preferred. The P2 proteins should provide at least an additive level of insecticidal activity when combined 
with the crystal protein toxin of BA.k. HD-1 or HD-73, and the combination may even provide a synergistic activity. 

55 Although the mode of action of the P2 protein is unknown, its distinct amino acid sequence suggests that it functions 
differently from th BA.k. HD-1 and HD-73 typ of proteins. Production of two insect tol ranee proteins with different 
mod s of action in th sam plant would minimize the potential for development of insect r sistance to B.t proteins in 
plants. Th lack of substantial DNA homology betw en P2 genes and th HD-1 and HD-73 gen s minimizes th po- 
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tential for recombination between multiple ins ct tolerance genes in the plant chromosome. 
[0125] The genes encoding the P2 protein although distinct in sequence from the B.tk. HD-1 and HD-73 genes shar 
many common features with these genes. In particular, the P2 protein genes have a high A+T content (65%), multipl 
potential polyadenylation signal sequences (26) and numerous ATTTA sequences (1 0). Because of its overall similarity 
s to the poorly expressed wild-type B.tk. HD-1 and HD-73 genes, the same problems are expected in expression of th 
wild-type P2 gene as were encountered with the previous examples. Bas d on the above-described method for de- 
signing the synthetic B.t. genes, a synthetic P2 gene has been designed which gene should be expressed at adequate 
levels for protection in plants. A comparision of the wild-type and synthetic P2 genes is shown in Figure 13. 

10 Example 7 -- Synthetic B.t. Entomocidus Gene 

[0126] The B.t. entomocidus ("Btent") protein is a distinct insecticidal protein produced by some strains of B.t bac- 
teria. It is characterized by its high level of activity against some lepidopterans that are relatively insensitive to B.tk 
HD-1 and HD-73 such as Spodoptera species including beet armyworm (Visser et al., 1988). Genes encoding the 

*5 Btent protein have been isolated and characterized (Honee et al, 1988). The Btent proteins encoded by these genes 
are approximately the same length as B.tk. HD-1 and HD-73. These proteins share only 68% amino acid homology 
with the B.tk HD-1 and HD-73 proteins. It is likeiy that only the N-terminal half of the Btent protein is required for 
insecticidal activity as is the case for HD-1 and HD-73. Over the first 625 amino acids, Btent shares only 38% amino 
acid homology with HD-1 and HD-73. 

20 [0127] Because of their higher activity against Spodoptera species that are relatively insensitive to HD-1 and HD- 
73, the Btent proteins are a desirable candidate for the production of insect tolerant transgenic plants either alone or 
in combination with the other B.t toxins described in the above examples. In some plants production of Btent alone 
might be sufficient to control the-agronomically important pests. In other plants, the production of two distinct insect 
tolerance proteins would provide protection against a wider array of insects. Against those insects where both proteins 

25 are active, the combination of the B.tk HD-1 or HD-73 type protein plus the Btent protein should provide at least 
additive insecticidal efficacy, and may even provide a synergistic activity. In addition, because of its distinct amino acid 
sequence, the Btent protein may have a different mode of action than HD-1 or HD-73. Production of two insecticidal 
proteins in the same plant with different modes of action would minimize the potential for development of insect resist- 
ance to B.t proteins in plants. The relative lack of DNA sequence homology with the B.tk. type genes minimizes the 

30 potential for recombination between multiple insect tolerance genes in the plant chromosome. 

[0128] The genes encoding the Btent protein although distinct in sequence from the B.tk. HD-1 and HD-73 genes 
share many common features with these genes. In particular, the Btent protein genes have a high A+T content (62%), 
multiple potential polyadenylation signal sequences (39 in the full length coding sequence and 27 in the first 1 875 
nucleotides that is likely to encode the active toxic fragment) and numerous ATTTA sequences (16 in the full length 

35 coding sequence and 12 in the first 1875 nucleotides). Because of its overall similarity to the poorly expressed wild 
type B.tk. HD-1 and HD-73 genes, the wild-type Btent genes are expected to exhibit similar problems in expression 
as were encountered with the wild-type HD-1 and HD-73 genes. Based on the above-described method used for 
designing the other synthetic B:t genes, a synthetic Btent gene has been designed which gene should be expressed 
at adequate levels for protection in plants. A comparision of the wild type and synthetic Btent genes is shown in Figure 

40 14. 

Example 8 •• Synthetic B.tk Genes for Expression In Corn 

[0129] High level expression of heterologous genes in com cells has been shown to be enhanced by the presence 
4* of a com gene intron (Callis et al., 1 987). Typically these introns have been located in the 5' untranslated region of the 
chimeric gene. It has been shown that the CaM V35S promoter and the NOS 3' end function efficiently in the expression 
of heterologous genes in com cells (Fromm et al., 1986). 

[0130] Referring to Figure 15, a plant expression cassette vector (pMON744) was constructed that contains these 
sequences. Specifically the expression cassette contains the enhanced CaMV 35S promoter followed by intron 1 of 

so the com Adhl gene (Callis et al. t 1987). This is followed by a multilinker cloning site for insertion of coding sequences; 
this multilinker contains a Bglll site among others. Following the multilinker is the NOS 3' end. pMON744 also contains 
the selectable marker gene 35S/NPTII/NOS 3* for kanamycin selection of transgenic com cells. In addition, pMON744 
has an E. coll origin of replication and an ampicillin resistance gene for selection of the plasmid in E. coll. 
[0131] Five B.tk coding sequences described in the previous examples were inserted into the Bglll site of pM ON 744 

55 for corn cell expression of B.tk The coding sequences inserted and resulting vectors were: 

1 . Wild type B.tk. HD-1 from pMON9921 to mak pMONB652. 

2. Modified B.tk. HD-1 from pMON5370to make pMON8642. 
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3. Synthetic B.tk. HD-1 from pMON5377 to make pMON8643. 

4. Synthetic B.tk. HD-73 from pMON5390 to make pMON8644. 

5. Synthetic full length BA.k. HD-73 from pMON10518 to mak pMON10902. 

s [0132] pMON8652 (wild-type B.tk. HD-1) was used to transform corn cell protoplasts and stably transformed kan- 
amycin resistant callus was isolated. B.tk. mRNA in the com cells was analyzed by nuclease S1 protection and found 
to be present at a level comparable to that seen with the same wild-type coding sequence (pMON9921) in transgenic 
tomato plants. 

[0133] pMON8652 and pMON8642 (modified HD-1) were used to transform corn cell protoplasts in a transient ex- 
10 pression system. The level of B.tk. mRNA was analyzed by nuclease S1 protection. The modified HD-1 gave rise to 
a several fold increase in B.tk. mRNA compared to the wild-type coding sequence in the transiently transformed corn 
cells. This indicated that the modifications introduced into the B.tk. HD-1 gene are capable of enhancing B.tk. expres- 
sion in monocot cells as was demonstrated for dicot plants and cells. 

[0134] pMON8642 (modified HD-1) and pMON8643 (synthetic HD-1) were used to transform Black Mexican Sweet 
is (BMS) com cell protoplasts by PEG-mediated DN A uptake, and stably transformed corn callus was selected by growth 
on kanamycin containing plant growth medium. Individual callus colonies that were derived from single transformed 
cells were isolated and propagated separately on kanamycin containing medium. 

[0135] To assess the expression of the B.tk. genes in these cells, callus samples were tested for insect toxicity by 
bioassay against tobacco hornworm larvae. For each vector, 96 callus lines were tested by bioassay. Portions of each 

20 callus were placed on sterile water agar plates, and five neonate tobacco hornworm larvae were added and allowed 
to feed for 4 days. For pMON8643, 1 00% of the larvae died after feeding on 1 5 of the 96 calli and these calli showed 
little feeding damage. For pMON8642, only 1 of the 96 calli was toxic to the larvae. This showed that the B.tk. gene 
was being expressed in these samples at insecticidal levels. The observation that significantly more calii containing 
pMON8643 were toxic than for pMON8642 showed that significantly higher levels of expression were obtained when 

25 the synthetic HD-1 coding sequence was contained in com ceils than when the modified HD-1 coding sequence was 
used, similar to the previous examples with dicot plants. A semiquantitative immunoassay showed that the pMON8643 
toxic samples had significantly higher B.tk. protein levels than the pMON8642 toxic sample. 
[0136] The 16 callus samples that were toxic to tobacco hornworm were also tested for activity against European 
com borer. European com borer is approximately 40-fold less sensitive to the HD-1 gene product than is tobacco 

30 hornworm. Larvae of European corn borer were applied to the callus samples and allowed to feed for 4 days. Two of 
the 16 calli tested, both of which contained pMONB643 (synthetic HD-1), were toxic to European corn borer larvae. 
[0137] To assess the expression of the B.tk. genes in differentiated com tissue, another method of DNA delivery 
was used. Young leaves were excised from corn plants, and DNA samples were delivered into the leaf tissue by mi- 
croprojectile bombardment. In this system, the DNA on the microprojectiles is transiently expressed in the leaf cells 

35 after bombardment. Three DNA samples were used, and each DNA was tested in triplicate. 

1. pMON744 t the corn expression vector with no B.tk. gene. 

2. pMON8643 (synthetic HD-1). 

3. pMON752, a corn expression vector for the GUS gene, no B.tk. gene. 

40 

[0138] The leaves were incubated at room temperature for 24 hours. The pMON752 samples were stained with a 
substrate that allows visual detection of the GUS gene product. This analysis showed that over one hundred spots in 
each sample were expressing the GUS product and the the triplicate samples showed very similar levels of GUS 
expression. For the pMON744 and pMON8643 samples 5 larvae of tobacco hornworm were added to each leaf and 
45 allowed to feed for 48 hours. All three samples bombarded with pMON744 showed extensive feeding damage and no 
larval mortality. All three samples bombarded with pMON8643 showed no evidence of feeding damage and 100% 
larval mortality. The samples were also assayed for the presence of B.tk. protein by a qualitative immunoassay. All of 
the pMON8643 samples had detectable B.tk. protein. These results demonstrated that the the synthetic B.tk, gene 
was expressed in differentiated corn plant tissue at insecticidal levels. 

so 

Example 9 - Expression of Synthetic B.t Genes with RUBISCO Small Subunft Promoters and Chloroplast Transit 
Peptides 

[0139] The genes in plants encoding the small subunit of RUBISCO (SSU) are often highly expressed, light regulated 
* 55 and sometimes show tissue specificity. These expression properties are largely due to the promoter sequences of 
th se genes. It has been possibl tous SSU promoters to expr ss heterologous g n sintransform d plants. Typically 
a plant will contain multiple SSU gen s, and the expression levels and tissu specificity of differ nt SSU genes will be 
different. The SSU proteins are encoded in the nucleus and synthesiz d in th cytoplasm as precursors that contain 
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an N-terminai extension known as the chloroplast transit peptide (CTP).Th CTP directs the precursor to the chloroplast 
and promotes the uptake of the SSU protein into the chloroplast. In this process, th CTP is cleaved from the SSU 
protein. These CTP sequences have been used to direct heterologous proteins into chloroplasts of transformed plants. 
[0140J The SSU promoters might have several advantages for expression of B.tk. genes in plants. Some SSU pro- 

s moters are very highly expressed and could give rise to expression levels as high or higher than those observed with 
the CaM V35S promoter. The tissue distribution of expression from SSU promoters is diff rent from that of the CaM V35S 
promoter, so for control of some insect pests, it may be advantageous to dir ct th xpression of B.t.k. to those cells 
in which SSU is most highly expressed. For example, although relatively constitutive, in the leaf the CaM V35S promoter 
is more highly expressed in vascular tissue than in some other parts of the leaf, while most SSU promoters are most 

10 highly expressed In the mesophyll cells of the leaf. Some SSU promoters also are more highly tissue specific, so it 
could be possible to utilize a specific SSU promoter to express B.t.k. in only a subset of plant tissues, if for example 
B.t. expression in certain cells was found to be deleterious to those cells. For example, for control of Colorado potato 
beetle in potato, it may be advantageous to use SSU promoters to direct B.tt expression to the leaves but not to the 
edible tubers. 

15 [0141] Utilizing SSU CTP sequences to localize B.t proteins to the chloroplast might also be advantageous. Local- 
ization of the B.t to the chloroplast could protect the protein from proteases found in the cytoplasm. This could stabilize 
the B.t protein and lead to higher levels of accumulation of active protein. B.t genes containing the CTP could be used 
in combination with the SSU promoter or with other promoters such as CaMV35S. 

[0142] A variety of plant transformation vectors were constructed for the expression of B.t.k. genes utilizing SSU 
. 20 promoters and SSU CTPs. The promoters and CTPs utilized were from the petunia SSU1 1 a gene described by Turner 
et at. (1986) and from the Arabidopsis atsIA gene (an SSU gene) described by Krebbers et al. (1988) and by Elionor 
et al. (1989). The petunia SSU1 1a promoter was contained on a DNA fragment that extended approximately 800 bp 
upstream of the SSU coding sequence. The Arabidopsis ats1 A promoter was contained on a DNA fragment that ex- 
tended approximately 1 .8 kb upstream of the SSU coding sequence. At the upstream end convenient sites from the 

25 multilinker of pUC1 8 were used to move these promoters into plant transformation vectors such as pMON893. These 
promoter fragments extended to the start of the SSU coding sequence at which point an Ncol restriction site was 
engineered to allow insertion of the B.t coding sequence, replacing the SSU coding sequence. 
[0143] When SSU promoters were used in combination with their CTP, the DNA fragments extended through the 
coding sequence of the CTP and a small portion of the mature SSU coding sequence at which point an Ncol restriction 

30 site was engineered by standard techniques to allow the in frame fusion of B.t coding sequences with the CTP. In 
particular, for the petunia SSU1 1a CTP, B.t. coding sequences were fused to the SSU sequence after amino acid 8 of 
the mature SSU sequence at which point the Ncol site was placed. The 8 amino acids of mature SSU sequence were 
included because preliminary in vitro chloroplast uptake experiments indicated that uptake was of B.t.k. was observed 
only if this segment of mature SSU was included. For the Arabidopsis ats1 A CTP, the complete CTP was included plus 

35 24 amino acids of mature SSU sequence plus the sequence gly-gly-arg-val-asn-cys-met-gln-ala-met, terminating in 
an Ncol site for B.t fusion. This short sequence reiterates the native SSU CTP cleavage site (between the cys and 
met) plus a short segment surrounding the cleavage site. This sequence was included in order to insure proper uptake 
into chloroplasts. B.t coding sequences were fused to this atsIA CTP after the met codon. In vitro uptake experiments 
with this CTP construction and other (non-6. f.) coding sequences showed that this CTP did target proteins to the 

40 chloroplast. 

[0144] When CTPs were used in combination with the CaMV 35S promoter, the same CTP segments were used. 
They were excised just upstream of the ATG start sites of the CTP by engineering of Bgll I sites, and placed downstream 
of the CaMV35S promoter in pMON893, as Bglil to Ncol fragments. B.t. coding sequences were fused as described 
above. 

45 [0145] The wild type B.tk. HD-1 coding sequence of pMON9921 (see Figure 1) was fused to the ats1 A promoter to 
make pMON1 925 orthe ats1 A promoter plus CTP to make pMON1 921 . These vectors were used to transform tobacco 
plants, and the plants were screened for activity against tobacco hornworm. No toxic plants were recovered. This is 
surprising in light of the fact that toxic plants could be recovered, albeit at a low frequency, after transformation with 
pMON9921 in which the B.tk. coding sequence was expressed from the enhanced CaMV35S, promoter in pMON893, 

50 and in light of the fact that Elionor et al. (1989) report that the atsIA promoter itself is comparable in strength to the 
CaMV35S promoter and approximately 10-fold stronger when the CTP sequence is included. At least for the wild-type 
B.t.k HD-1 coding sequence, this does not appear to be the case. 

[0146] A variety of plant transformation vectors were constructed utilizing either the truncated synthetic . HD-73 
coding sequence of Figure 4 orthe full length B.tk. HD-73 coding sequence of Figure 11 . These are listed in the table 
55 below. 
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Table XV 



Gene Constructs with CTPs 


Vector 


Promoter 


ol r 


O.I.K. nU'/o isuuiny sequence 


pMON10806 


En 35S 


atsIA 


truncated 


pMON10814 


En35S 


SSU11a 


full length 


pMON10811 


SSU11a 


SSUlla 


iruncaiea 


pMON10819 


SSU11a 


none 


truncated 


pMON10815 


atsIA 


none 


truncated 


pMON10817 


atsIA 


atsIA 


truncated 


pMON10821 


En 35S 


atsIA 


truncated 


pMON10822 


En 35S 


atsIA 


full length 


pMON10838 


SSU11a 


SSU11a 


full length 


pMON10839 


atsIA 


atsIA 


full length 



[0147] All of the above vectors were used to transform tobacco plants. For all of the vectors containing truncated B. 

tk. genes, leaf tissue from these plants has been analyzed for toxicity to insects and B.tk. protein levels by immu- 
20 noassay. pMON10806, 10811, 10819 and 10821 produce levels of B.tk. protein comparable to pMON5383 and 

pMON5390 which contain synthetic B.tk. HD-73 coding sequences driven by the En 35S promoter itself with no CTP. 

These plants also have the insecticidal activity expected for the B.tk. protein levels detected. For pMON10815 and 

pMON1 081 7 (containing the atsIA promoter), the level of B.tk. protein is about 5-fold higher than that found in plants 

containing pMON5383 or 5390. These plants also have higher insecticidal activity. Plants containing 1 081 5 and 1 081 7 
25 contain up to 1 % of their total soluble leaf protein as B.tk. HD-73. This is the highest level of B.tk. protein yet obtained 

with any of the synthetic genes. 

[0148] This result is surprising in two respects. First, as noted above, the wild type coding sequences fused to the 
ats1 A promoter and CTP did not show any evidence of higher levels of expression than for En 35S, and in fact had 
lower expression based on the absence of any insecticidal plants. Second, Elionoretal. (1989) show that for two other 
50 genes, the atsIA CTP can increase expression from the atsIA promoter by about 1 0-fold. For the synthetic B.tk. HD- 
73 gene, there is no consistent increase seen by including the CTP over and above that seen for the atsIA promoter 
alone. 

[0149] Tobacco plants containing the full length synthetic HD-73 fused to the SSU11 A CTP and driven by the En 
35S promoter produced levels of B.tk. protein and insecticidal activity comparable to pMON1 51 8 which contains does 
35 not include the CTP. In addition, for pMON1 051 8 the B.tk. protein extracted from plants was observed by gel electro- 
phoresis to contain multiple forms less than full length, apparently due the cleavage of the C-terminal portion (not 
required for toxicity) in the cytoplasm. For pMON10814, the majority of the protein appeared to be intact full length 
indicating that the protein has been stabilized from proteolysis by targeting to the chloroplast. 

Example 1 0 - Targeting of B.t Proteins to the Extracellular Space or Vacuole through the Use of Signal Peptides 

[01 50] The B.t proteins produced from the synthetic genes described here are localized to the cytoplasm of the plant 
cell, and this cytoplasmic localization results in plants that are insecticidally effective. It may be advantageous for some 
purposes to direct the B.t proteins to other compartments of the plant cell. Localizing B.t proteins in compartments 

45 other than the cytoplasm may result in less exposure of the B.t proteins to cytoplasmic proteases leading to greater 
accumulation of the protein yielding enhanced insecticidal activity. Extracellular localization could lead to more efficient 
exposure of certain insects to the B.t proteins leading to greater efficacy. If a B.t protein were found to be deleterious 
to plant cell function, then localization to a noncytoplasmic compartment could protect these cells from the . protein. 
[0151] In plants as well as other eucaryotes, proteins that are destined to be localized either extracellularly or in 

50 several specific compartments are typically synthesized with an N-terminal amino acid extension known as the signal 
peptide. This signal peptide directs the protein to enter the compartmentalization pathway, and it is typically cleaved 
from the mature protein as an early step in compartmentalization. For an extracellular protein, the secretory pathway 
typically involves cotranslational insertion into the endoplasmic reticulum with cleavage of the signal peptide occuring 
at this stage. The mature protein then passes thru the Golgi body into vesicles that fuse with the plasma membrane 

55 thus releasing the protein into the extracellular space. Proteins destined for other compartments follow-a similar path- 
way. For example, prot ins that are destined for th endoplasmic reticulum or the Golgi body follow this sch me,' but 
they are specifically retained in th appropriat compartment. In plants, some proteins ar also targeted to the vacuol , 
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another membrane bound compartment in the cytoplasam of many plant cells. Vacuole targeted proteins diverge from 
the above pathway at the Golgi body where they nter vesicles that fuse with the vacuole. 

[0152] A common feature of this protein targeting is the signal peptide that initiates the compartmentalization process. 
Fusing a signal peptide to a protein will in many cases lead to the targeting of that protein to the endoplasmic reticulum. 

5 The efficiency of this step may depend on the sequence of the mature protein itself as well. The signals that direct a 
protein to a specific compartment rather than to the extracellular space are not as clearly defined. It appears that many 
of the signals that direct the protein to specific compartments are contained within the amino acid sequence of the 
mature protein. This has been shown for some vacuole targeted proteins, but it is not yet possible to define these 
sequences precisely. It appears that secretion into the extracellular space is the "default" pathway for a protein that 

10 contains a signal sequence but no other compartmentalization signals. Thus, a strategy to direct B.t proteins out of 
the cytoplasm is to fuse the genes for synthetic B.t genes to DNA sequences encoding known plant signal peptides. 
These fusion genes will give rise to B.t proteins that enter the secretory pathway, and lead to extraceliualar secretion 
or targeting to the vacuole or other compartments. 

[01 53] Signal sequences for several plant genes have been described. One such sequence is for the tobacco patrio- 
ts genesis related protein PR1b described by Comelissen et al. The PR1b protein is normally localized to the extracellular 
space. Another type of signal peptide is contained on seed storage proteins of legumes. These proteins are localized 
to the protein body of seeds, which is a vacuole like compartment found in seeds. A signal peptide DNA sequence for 
the beta subunit of the 7S storage protein of common bean (Phaseolus vulgaris), PvuB has been described by Doyle 
et al. Based on the published these published sequences, genes were synthesized by chemical synthesis of oligonu- 
20 cleotides that encoded the signal peptides for PR1b and PvuB. The synthetic genes for these signal peptides corre- 
sponded exactly to the reported DNA sequences. Just upstream of the translational intiation codon of each signal 
peptide a BamHI and Bglll site were inserted with the BamHI site at the 5' end. This allowed the insertion of the signal 
peptide encoding segments into the Bglll site of pMON893 for expression from the En 35S promoter. In some cases 
to achieve secretion or compartmentalization of heterologous proteins, it has proved necessary to include some amino 
25 acid sequence beyond the normal cleavage site of the signal peptide. This may be necessary to insure proper cleavage 
of the signal peptide. For PR1 b the synthetic DNA sequence also included the first 1 0 amino acids of mature PR1 b. 
For PvuB the synthetic DNA sequence included the first 1 3 amino acids of mature PvuB. Both synthetic signal peptide 
encoding segments ended with Ncol sites to allow fusion in frame to the methionine initiation codon of the synthetic 
B.t. genes. 

30 [0154] Four vectors encoding synthetic B.tk. HD-73 genes were constructed containing these signal peptides. The 
synthetic truncated HD-73 gene from pMON5383 was fused with the signal peptide sequence of PvuB and incorporated 
into pMON893 to create pMON10827. The synthetic truncated HD-73 gene from pMON5383 was also fused with the 
signal peptide sequence of PR1b to create pMON10824. The full length synthetic HD-73 gene from pMON10518 was 
fused with the signal peptide sequence of PvuB and incorporated into pMON893 to create pMON1 0828. The full length 

35 synthetic HD-73 gene from pMON10518 was also fused with the signal peptide sequence of PR1b and incorporated 
into pMON893 to create pMON 10825. 

[0155] These vectors were used to transform tobacco plants and the plants were assayed for expression of the B.t. 
k. protein by Western blot analysis and for insecticida! efficacy. pMON10824 and pMON10827 produced amounts of 
B.tk. protein in leaf comparable to the truncated HD-73 vectors, pMON5383 and pMON5390. pMON10825 and 
40 pMON10828 produced full length B.tk. protein in amounts comparable to pMON10518. In all cases, the plants were 
insecticidally active against tobacco horn worm. 
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Claims 

1. A method for modifying a wild-type structural gene sequence which encodes an insecticidal protein of Bacillus 
thuringiensis to enhance the expression of said protein in plants which comprises: 

5 

a) identifying regions within said sequence with greater than four consecutive adenine or thymin nucleotides; 

b) modifying the regions of step (a) which h ave two or more polyadenylation signals within a ten base sequence 
to remove said signals while maintaining a gene sequence which encodes said protein; and 

10 

c) modifying the 1 5-30 base regions surrounding the regions of step (a) to remove major plant polyadenylation 
signals, consecutive sequences containing more than one minor polyadenylation signal and consecutive se- 
quences containing more than one ATTTA sequence while maintaining a gene sequence which encodes said 
protein. 

15 

2, A method for modifying a wild-type structural gene sequence which encodes an insecticidal protein of Bacillus 
thuringiensis to enhance the expression of said protein in plants which comprises: 

a) removing polyadenylation signals contained in said wild-type gene while retaining a sequence which en- 
20 codes said protein; and 

b) removing ATTTA sequences contained in said wild-type gene while retaining a sequence which encodes 
said protein. 

25 3. A method of claim 2 further comprising the removal of self-complementary sequences and replacement of such 
sequences with nonself-complementary DNA comprising plant preferred codons while retaining a structural gene 
sequence encoding said protein. 

4. A method of claims 1 to 3 further comprising the use of plant preferred sequences in the removal of the polyade- 
30 nylation signals and ATTTA sequences. 

5. A method of claims 1 to 3 in which the plant polyadenylation signals are selected from the group consisting of 
AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAA- 
TA, ATTAAA, AATTAA, AATACA and CATAAA. 

35 

6. A method for improving the expression of a heterologous gene in plants wherein said gene comprises a modified 
chimeric gene containing a promoter which functions in plant cells operably linked to a structural coding sequence 
and a 3' non-translated region containing a polyadenylation signal which functions in plants to cause the addition 
of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural coding sequence encodes an in- 

40 secticidal protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein said method 

comprises modifying said structural coding sequence so that said sequence has a DNA sequence which differs 
from the naturally occurring DNA sequence encoding said Bacillus thuringiensis protein and said structural coding 
sequence does not contain more than 5 consecutive nucleotides consisting of either adenine or thymine residues. 

45 7. A method for improving the expression of a heterologous gene in plants wherein said gene comprises a modified 
chimeric gene containing a promoter which functions in plant cells operably linked to a structural coding sequence 
and a 3* non-translated region containing a polyadenylation signal which functions in plants to cause the addition 
of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural coding sequence encodes an in- 
secticidal protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein said method 

so comprises modifying said structural coding sequence so that said sequence has a DNA sequence which differs 

from the naturally occurring DNA sequence encoding said Bacillus thuringiensis protein and has the following 
characteristics: 

said structural coding sequence has a region which is complementary to the following sequence: 

55 
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GGCTTGATTCCTAGCGAACTCTTCGATTCTCTGGTTGATGAGCTGTTC 
1 5 10 15 20 25 30 35 40 45 

said region in said coding sequence having eliminated 2 AACCAA and 1 AATTAA sequence. 

8. A method according to claim 7, wherein said structural coding sequence encodes an insecticidai protein at least 
a portion of which was derived from a Bacillus thuringiensis kurstakis HD-1 . 

9. A method according to claim 7 or 8, wherein the plant is a tobacco plant. 

1 0. A modified chimeric gene containing a promoter which functions in plant cells operably linked to a structural coding 
sequence and a 3' non-translated region containing a polyadenylation signal which functions in plants to cause 
the addition of polyadenylate nucleotides to the 3' end of the RNA, wherein said structural coding sequence en- 
codes an insecticidai protein at least a portion of which was derived from a Bacillus thuringiensis protein, wherein 
said structural coding sequence has a DNA sequence which differs from the naturally occurring DNA sequence 
encoding said Bacillus thuringiensis protein and is selected from: 

A. A structural gene which encodes an insecticidai protein of B.tX HD-1 having the sequence: 
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• • • • 
1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATT7CCT 40 

• • • * 

41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 

• • • • 

8i TGCTGGATTTGTG7TAGGACTAGTTGATATTATCTGGGGA 120 

« ■ • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

• • • * 

161 T7GAACAGCTC ATCAACCAGAGAATCGAAG AGTTCGCTAG 200 

• • * • 
201 G AA7C AAGCC ATT7CTAG ATTAGAAGG ACT AAGCAATCTT 240 

• * • • 

241 TATC AAATTTACGC AGAATCTTTT AG AGAGTGGG AAGCAG 280 

• * • • 

2s 281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

• • • • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTAT7CC7 360 

• • • • 

361 CTT7TTGCAGTTCAAAATTATCAAGTTCCTCTCCTCTCCG 400 

• • • • 

401 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 440 

• • • • 

441 AGA7G7TTC AGTGTTTGG ACAAAGGTGGGG ATTTGATGCC 480 

• • • • 
481 GCGACTATCAATAG7CGTTATAATGATTTAAC7AGGCTTA 520 
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• • • • 

52 1 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 5 60 

• • • • 
561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

• • • * 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 640 

• • • • 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 
681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 

t • " • 

721 ATTTATACAAACCCAGTATTAG AAAATTTTGATGGTAGTT 760 

• • • • 

761 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

• • • • 

25 801 TCCACATTTGATGGATATACTTAATAGTATAACCATCTAT 840 

■ • « » • 

841 ACGGATGCTCATAGAGGAGAATACTACTGGTCCGGTCACC 880 

• • • * 
881 AGA7CATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 

• • • • 
921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA . 960 

• • • • 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

• • • * 

1001 G AACATT ATCGTCCACCTTATAT AGAAGACCTTTTAACAT 1040 

• • • • 
1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 

• • • • 

1081 G AATTTGCTTATGGAACCTCCTC AAATTTGCCATCCGCTG 1120 

50 
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• • • • 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• • • • 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

« • • • 

1201 AGTCA7CGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 

« • • • 

1241 TTAGTAATAGTAGTGTAAGT ATAATAAGAGCTCCTATGTT 1280 

• • • • 

1281 CTCTTGG ATACATCGTAGTGCTGAGTTCAAC AACATCATC 1320 

• ♦ * • 

1321 CCTTCATCACAAATCACCCAAATCCCACTCACCAAGTCTA 1360 

• • * * 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

• • • • 

14 01 ATTT AC AGG AGGAGATATTCTTCG AAGAACTTCACCTGGC 1440 

• ♦ • • 

1441 C AGATTTC AACCTT AAG AGTAAATATTACTGC ACCATTAT 14 80 

• • • * 

1481 CACAAAG AT ATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

• * • * 

1521 AAACCTTCAGTTCCACACATCAATTGACGGAAGACCTATT 1560 

• • • 

15 61 AATCAGGGG AATTTTTC AGCAACTATGAGTAGTGGG AGTA 1600 

• • * • 

1601 ATT7ACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

, . • • 

1641 TCCGTTTAACTTTTCAAATGGATC AAGTGT ATTTACGTTA 1680 

. • • • 

1 681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 17 43 ; 

A structural gene which encodes an insecticidal protein of B.tk. HD-73 having the sequence: 
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« * • • 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 

• • • * 

4 1 TGTCCTTG ACAC AGTTTCTGCTCAGCGAGTTCGTGCCAGG 8 0 

, • • a 

81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 120 

• • • ■ 

121 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 1 60 

« • * • 

161 TTG AG CAGTT G A7C AACC AG AGG ATC GAAG AGTTCGCCAG 200 

• • • « 

201 GAACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTC 240 

• * • * 

241 T ACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

• • • • 

25 2S1 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 

• • • • 

321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA 3 60 

30 * 

361 TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG 400 

« • ... 

401 TGTACGTTCAAGC AGCTAATCTTC ACCTCAGCGTGCTTCG 440 

35 
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• • * • 
441 AGACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCT 4 80 

• • • • 

481 GCAACC ATCAAT AGCCGTTACAACGACCTTACTAGGCTGA 520 

• • • * 
521 TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAACAC 560 

• • • • 

S 61 TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG 600 

• • • • 
601 ATTAGATACAACC AGTTCAGGAGAGAATTGACCCTC AC AG 640 

• * • • 
641 TTTTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAG 680 

• • • * 

681 AACCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAA 720 

• • • • 

25 721 ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCT 7 60 

• • • ■ 

7 61 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 800 

• • • • 
801 CCCACACTTGATGGACATCTTGAACAGCATAACTATCTAC 840 

• • • • 

841 ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 880 

• • • * 

881 AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 920 

• • • • 

921 T ACCTTTCCTCTC TATGGAACTATGGGAAACGCCGCTCC A 960 

• * ♦ • 
961 CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 1000 

• • • • 

1001 GAACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 1040 

50 
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• * • 

1041 CGGTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACA 1080 

• • • • 

1081 GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 1120 

• * • 

1121 TTTACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAAT 1160 

• • • • 

1161 CCCACCACAGAACAACAATGTGCCACCCAGGCAAGGATTC 1200 

• • • • 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 1240 

• • • • 

1241 TCAGC AAC AGTTCCGTG AGC ATC ATC AGAGCTCCTATGTT 1280 

• • * • 

1281 CTCTTGGATACACCGTAGTGCTGAG7TCAACAACATCATC 1320 

• • • « 

1321 GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 

• • • • 

1361 ACTTTCTCTTCAAC3GTTCTGTCATT7CAGGACCAGGATT 1400 

• • • • 

1401 CACTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAAT 1440 
as 1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAAT7CACT 1480 

• • • • 

1481 TCCCATCCACATCTACCAGATATAGAGTTCGTG7GAGGTA 1520 

• • • • 

1521 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 1560 

• • • • 

1561 AATTCATCCATCTTCTCC AATACAGTTCCAGCTACAGCTA 1600 

• • • • 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 

50 
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. • • • 

1641 TGAAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATC 1680 

• • • • 

1681 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 1720 

• ♦ * * 

1721 TCGACAG ATTCGAGTTCATTCCAGTTACTGCAACACTCGA 17 60 
1761 GGCTGAG 1767. 

C. A structural gene encoding a insecticidal protein of B.tk. HD-1 having the sequence: 



20 i ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • • • 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • • • 

121 TCCTTGAC ACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 
161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • • 
201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

« • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• • • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• • * * 

« 321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 
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• • • • 
'361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

• » • • 

401 TCAACG ACATGAACAGCGCCTTGACCACAGCTATCCC ATT 

• • • ■ 
441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

• • • • 
481 TACGTTCAAGC AGCTAATCTTCACCTCAGCGTGCTTCGAG 

• • • • 
521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

• • • * 
5 61 AACCATC AATAGCCG7TACAACGACCTTACT AGGCTGATT 

• • * • 
601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

• • * ♦ 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 

• • • • 
681 TAGATACAACCAGTTCAGGAGAGAAT7GACCCTCACAGTT 

721 TTGG ACATTGTGTCTCTCTTCCCGAACTATG ACTCCAGAA 

• « • • 
7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 

a • • • 

801 CTATACT AACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 

• • • • 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

• • • • 

881 C ACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 

• • * ♦ 
921 CGATGCTCACAGAGGAGAGTAT7ACTGGTCTGGACACCAG 
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• • • • 

' 961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

5 • ' * * 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

ig 1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

■ • • • 

' 5 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • • 

1161 GTTCGCCT ATGG AACCTCTTCT AACTTGCC ATCCGCTGTT 1200 

20 

■ • * • 

1201 TAC AGAAAGAGCGGAACCGTTGATTCCTTGG ACGAAATCC 1240 

• • • • 

25 1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 
so .... 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• - • * 

1361 C ATGGATTCATCGTAGTGCTGAGTTC AACAATATC ATTCC 1400 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACC AAGTCTACT 1440 

• • • • 

40 1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

• * • • 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

45 

• * • • 

1521 G ATTAGC ACCCTC AG AGTTAACATC ACTGC ACCACTTTCT 1560 

50 
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• • • • 

1561 CAAAGATATCGTGTCAGGATTCGTTACGCATC7ACCACTA 1600 

• • • • 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

« • • * 

1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

• • • • 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTC ACTACTC 1720 

■ • • • 

1721 CTTTCAACTTCTCTAACGGATC&AGCGTTTTCACCCTTAG 1760 

• • * a 

1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTG7ACATTGAC 1800 

• • • • 

1301 CGTATTG AGTTTGTGCCTGCCGAAGTT ACCTTCGAGGCTG 1840 

1841 AGTAC 1845. 

D. A structural gene encoding an insecticidal protein derived from B.tk. HD-73 having the sequence: 

1 ATGG ACAAC AACCCAAACATCAACGAATGCATTCC ATACA 4 0 

4 1 ACTGCTTGAGTAACCC AGAAGTTGAAGTACTTGGTGGAGA 8 0 

8 1 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 



so 



15 



20 



25 



EP 0 385 962 B1 

• • • * 

'201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

5 

• • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• • • « 

io 281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

• • • • 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 
401 TC AACGAC ATGAACAGCGCCTTGACCAC AGCTATCCC ATT 440 

4 41 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 
481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • * • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • # * 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • • • 

35 601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

« « • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 68 0 

• • • * 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 
761 CCTACCCTATCCGTACAGTGTCCCAACTTACC AGAGAAAT 800 

50 
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'801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

5 

• • • » 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

10 881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • • 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCT7GTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • * 

3S !201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

, « • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • 

13 61 CT7GGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 
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• • • . 

- '1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• ♦ • • 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• « • 9 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • * 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1550 

• • • a 

15 61 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

1601 CTTCTGTGACCCCTATTCACCTC AACGTTAATTGGGGTAA 1640 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1630 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 17 60 

• • • ■ 

1761 GGGTGTT AG AAACTTTAGTGGGACTGCAGGAGTG ATT ATC 1800 

• • • • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

• • • « 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1920. 
1921 G 1921J 

E. A structural gene encoding the full-length insecticidal protein of B.tk. HD-73 having the sequence: 



S3 
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1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 
4 1 ACTGCTTGAGTAACCC AGAAGTTGAAGTACTTGGTGGAGA 

• ♦ • • 

8 1 ACGC ATTGAAACCGGTTACACTCCC ATCGACATCTCCTTG 

• • * 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 

♦ * * • 
161 CTGGGTTCGTTCTCGG ACTAGTTG ACATC ATCTGGGGTAT 

20 1 CTTTGGTCCATCTCAAT6GGAT6CATTCCTGGTGCAAATT 

241 GAGCAGTTGATCAACCAGAGGATCG AAGAGTTCGCCAGGA 

• • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

361 CCTACTAACCC AGCTCTCCGCGAGGAAATGCGTATTCAAT 

401 TC AACG ACATG AACAGCGCCTTG ACCACAGCTATCCCATT 

441 GTTCGCAGTCC AGAACTACCAAGTTCCTCTCTTGTCCGTG 

481 TACGTTCAAGC AGCTAATCTT C ACCTC AGCGTGCTTCGAG 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 
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• • • • 
• 601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • • • 

64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

« « • » 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • ♦ 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 
761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 
841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 
881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• « • • 

35 iOOl CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

" 1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

• • • • 
1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • • 

1161 GTTCGCCTATGG AACCTCTTCTAACTTGCCATCCGCTGTT 1200 
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• • t • 
"1201 TAC AGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

5 

• • • • 

1241 CACCACAGAACAAC AATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

10 1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

» • « • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • • 

1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

• • • • 
1401 ATCCGATAGTATTACTC AAATCCCTGCAGTGAAGGGAAAC 144 0 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • • • 
1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • 
1521 C ATTC AG AAT AG AGGGTATATTGAAGTTCCAATTC ACTTC 15 50 

• • • • 

1561 CC ATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1 600 

• • • • 

35 isoi CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

• § • • 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

• • * * 

1681 TCCTTGGATAAT CTCC AATCC AGCGATTTCGGTTACTTTG 1720 

• • • • 
1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 17 60 

• . • • 

17 61 GGGrGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 
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• • ♦ • 

' 1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

5 

■ • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

• * * ■ 

10 1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 1920 

• • * . * 

1921 GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA 1960 

• • • • 

1961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 2000 

■ • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

• • • • 
2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACAXTA 2080 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 2160 

• • • • 

2161 GTC ACACTATC AGGTACCTTTGATGAGTGCTATCCAACAT 2200 
35 2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 224 0 

» • • • 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
2281 GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACATG 2320 

• • • • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • • • 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
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• • • • 

' 2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

5 

• • • * 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• « * • 

10 2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

• • • • 
2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

• • • • 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 2680 

• • • * 
2681 TGGAATGGGAGACCAACATCGTCTACAAAGAGGCAAAAGA 2720 

• • * • 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTC AATATGATCAA 2760 

• • • • 

2761 TTACAAC-CGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
as 2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 2920 

• • • 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

♦ • • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
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• ♦ • • 

' 3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

• • • « 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • » • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• • • • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• • • • 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

« • • • 

3201 AATCTATCC AAATAACACGGTAACGTGTAATGATTAT ACT 3240 

• • * • 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 
...» 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTC AGTCTATGAAGAAAAATCGTATAC AG ATGGACGA 3360 
33 61 AGAGAGAATCCTTGTG AATTTAACAGAGGGTATAGGGATT 3400 
3401 ACACGCCACT ACCAGTTGGTTATGTG AC AAAAGAATTAGA 3440 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

. • • • 

3481 GAAACGGAAGGAACATTTATCGTGG ACAGCGTGGAATTAC 3520 

3521 TCCTTATGG AGG AA 3534. 

F. A structural gene encoding a full-length insecticidal protein of B.Lk. HD-73 having the sequence: 
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• • i • 
1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • * • 

41 ACTGCTTGAGTAACCCAGAAG7TGAAGTACTTGGTGGAGA 80 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACA7CTCCTTG 120 

• • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• * • • 

161 C7GGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

» • * « 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGC AAATT 24 0 

• « • • 

2 41 GAGC AGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

■ • • • 

25 28 i ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

. ■ • % 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • * 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • • • 
441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 
481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 
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• * * 
' 521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

SSI AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • * • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • • * 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

t • • * 

so 121 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

« • • • 

801 CTATACTAACCCAGTTCTTGAGAACTTCG ACGGTAGCTTC 840 

• • " * 

84 i CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

. • • • 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • • 

100 1 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCC ACCTTGTAC AGAAGACCCTTCAATATCG 1120 

so 
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• • • • 

' 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 

« • • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 

4 * ♦ « 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 

• • • • 

1241 . CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 

• • • « 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCG7TCCGGATTC 

• « « • 
1321 AGCAACAGTTCCGTGAGCATCATC AGAGCTCCTATGTTCT 

• • • • 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 

• * • • 
1401 ATCCG ATAGTATTACTC AAATCCCTGCAGTGAAGGGAAAC 

• * • • • 
1441 TTTCTCTTC AACGGTTCTGTC ATTTCAGG ACCAGGATTCA 

• • • • 
1481 C7GGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 

• • • • 
1521 CATTCAGAATAG AGGGTATATTG AAGTTCCAATTCACTTC 

• * • • 
1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 

• * • • 
1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 

• • • • 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

• • • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 
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• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 17 SO 

• « • * 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• • • • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• • • • 

1841 CTGAATAT AATCTGGAAAGAGCGCAGAAGGCGGTG AATGC 1880 

• • • • 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1 52 0 

• • ■ • 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA I960 

• • • • 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 

. . . • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 

• • • 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

* 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

« • • * 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCC AACAT 2200 

■ • • • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

♦ 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

• • ♦ • 

2281 G ACTT AGAAATC T ATTT AATTCGCTACAATGCAAAACATG 2320 
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" 2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • ♦ * 
2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• * • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

• • • • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

• • • * 

20 2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

• * • • 

2561 CGCAAG ATGGGC ACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • * • 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 264 0 

• • • • 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

• • • • 

2681 TGGAATGGG AAAC AAAT ATCGTTTATAAAGAGGCAAAAGA 2720 

• • • • 
2721 ATCTGT AG ATGCTTTATTTGTAAACTCTC AATATG ATCAA 2760 

• • • • 
27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
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2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTG ATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

* • • • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 
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• • • • 

'2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• • • • 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

• • • * 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • • • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3 120 

• • • • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

» • • • 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

• • • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATG ATTATACT 3240 

• • • • 

3241 GTAAATC AAGAAGAATACGG AGGTGCGT AC ACTTCTCGTA 3280 

• • • • 

3261 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

• • * * 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 33 60 

« • • • 

3361 AG AG AG AATCCTTGTGAATTT AACAGAGGGT ATAGGG ATT 3400 

• • • • 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

• • • • 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

« • • * 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 
3521 TCCTTATGGAGG AA 3534. 

G. A structural gene encoding a full-length insecticidal protein of BAM. HD-73 having the sequenc : 
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• • • • 
1 ATGGAC AACAACCCAAACATCAACGAATGCATTCCATACA 4 0 

• ♦ • • 
41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCC AGGTG 1 60 

• • • * 
161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

. • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

, • . • 

25 281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• • • • 

321 C CAAATCTATGCAG AGAGCTTC AG AG AGTGGG AAGCCGAT 360 

30 . • • • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • ■ * 

35 401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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• • « • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

5 

• • • • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • • • 

10 521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

...» 
561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • • ♦ 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • • • 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 
761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • • • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 
. . • • 

35 841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • ♦ • 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• ♦ • • 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

so 
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• • ■ • 

1041 ACAACGT ATCGTTGCTCAACTAGGTC AGGGTGTCTAC AGA 1080 

5 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 
« • * • 

10 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• ♦ • • 

11 61 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 
« .... 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

20 12 41 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• * • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 ■ 

. • a * 

13 61 CTTGGATACACCGTAG7GCTGAGTTCAACAACATCATCGC 1400 

• • . • 
1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

■ • • • 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • • • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

1 

• • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • • • 

45 1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• • • • 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 
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• * • • 
1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

• • • • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 17 60 

• • • • 
17 61 GGGTGTT AGAAACTTTAGTGGGACTGCAGG AGTGATTATC 1800 

• * • * 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• • • * 

20 1841 CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC 1880 

• • • • 

1881 CCTCTTTACCTCC ACCAATCAGCTTGGCTTG AAAACTAAC 1920 

• • • • 
1921 GTTACTGACT ATC ACATTGACCAAGTGTCC AACTTGGTCA 1960 

» • • • 

1961 CCTACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 2000 

• • • • 

2001 ACTCTCCGAGAAAGTTAAACACGCC AAGCGTCTCAGCGAC 2040 

• • • • 

3S 2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 2080 

2081 ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAT 2120 

• • • • 

2121 CACCATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 2160 
2161 GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT 2200 

• • • • 

2201 ACTTGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTT 2240 
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22 41 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAA 2280 
2281 GAC CTTG AAATCTACTCG ATC AGGT ACAATG CC AAGC ACG 2320 

• * * * 

2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT 2360 

• • • * 

2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 

• • • • 

2401 AGATGCGCTCC AC ACCTTG AGTGGAATCCTGACTTGGACT 2440 

• • • * 

20 2441 GCTCCTGC AGGGATGGCGAG AAGTGTGCCC ACCATTCTCA 2480 

■ • • • 

2481 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 2520 

• • * • 

2521 AATGAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGA 25 60 

• • * • 
25 61 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 2 600 

• • * • 
2601 C GAAG AGAAACC AT7GGTCGG7G AAGCTCTCGCTCGTGTG 2640 

• • • • 
2641 AAGAGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAAC 2680 

• • « » 

2681 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

• • • • 

2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 2760 

• « « 

45 2761 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 2800 

2801 ACAAACGTGTGCAC AGCATTCGTGAGGCTTACTTGCCTGA 2840 
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2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAG 2880 

• • • • 
2881 GAACTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACG 2920 

• • ♦ • 

2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

• • • • 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 3000 

• • • • 

3001 GAACAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGT 3040 

• • • • 

3041 GGGAAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGG 3080 

• • • • 

3081 TAGAGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGA 3120 

• • • • 

3121 TACGGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACA 3160 

• • • • 

30 3i si ACACCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGA 3200 

• ♦ * * 

3201 AATCTATCCCAAC AACACCGTTACTTGC AACGACTAC ACT 3240 

• • • • 
3241 GTGAATC AGG AAG AGTACGGAGGTGCCTACACTAGCCGTA 3280 

3281 ACAGAGGTTAC AACGAAGCTCCTTCCGTTCCTGCTGACTA 3320 

• * • • 
3321 TGCCTCCGTGTACGAGGAGAAATCCT AC AC AGATGGCAGA 3360 

• • • • 
33 61 CGTGAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACT 3400 

• • * • 

3401 ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 3440 
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• • * • 

34 41 GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

5 , • • • 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 3520 
, 0 35 21 TCTTGATGGAGGAA 3534. 

H. A structural gene which encodes an insecticidal protein of B.U. having the sequence: 

15 .... 

1 ATGACTGCAGACAACAACACCGAAGCCCTCGACAGTTCTA 40 

• • • 

20 41 CCACTAAGGATGTTATCCAGAAGGG7ATCTCCGTTGTGGG 80 

• • • 

81 AGACCTCTTGGGCGTGGTTGGATTTCCCTTCGGTGGAGCC 120 

25 . . • « 

121 CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC 1 60 

* • • * 

1 61 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA 200 

30 

201 AGCTCTTATGGATCAGAAGATTGCAGATTATGCCAAGAAC 240 

• • * 

55 241 AAGGCTTTGGCAGAACTCCAGGGCCTTCAGAACAATGTGG 280 

281 AGGACTACGTGAGTGCATTGTCCAGCTGGCAGAAGAACCC 320 
40 . . . . 

321 TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAGA 3 60 

45 3 61 GAGTTGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 400 
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• • • • 
'401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 4 40 

. • • • 

441 CACTACCTATGCTCAAGCTGCCAACACCCACTTGTTTCTC 480 

, . • ♦ 

481 CTTAAGGACGCTCAAATC7ATGGAGAAGAGTGGGGATACG 520 

• • • • 

521 AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 560 

... • • 

561 GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 600 

601 AACGTTGGTCTCG ATAAGCTCAGAGGCTCTTCCTACGAGT 640 

641 CTTGGGTGAACTTCAACAGATACAGGAGAGAGATGACCTT 680 

25 681 GACTGTGCTCGATCTTATCGCACTCTTTCCCTTGTACGAT 720 

* 

721 GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 7 60 

7 61 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 800 

• • • • 
801 TAGGGGTTATGGAACT ACCTTCAGC AATATCGAAAACTAC 840 

♦ 

841 ATTAGG AAACC AC ATCTCTTCG ACTATCTTC ACAG AATTC 880 

881 AATTCCACACAAGGTTTCAACCAGG ATACTATGGTAACGA 920 

921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 9 60 

961 CCAAGCATTGGATCTAATGACATCATCACATCTCCCTTCT 1000 

so 
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• • • • 

1001 ATGGTAACAAGTCCAGTGAACCTGTGCAGAACCTTGAGTT 1040 

5 

• * • • 

1041 C AACGGCGAGAAAGTCT ATAGAGCCGTCGC AAAC ACCAAT 1080 

• ♦ • • 

10 1081 CTCGCTGTGTGGCCATCCGCAGTTTACTCAGGCGTCACAA 1120 

• • • • 

1121 AGGTGG AGTTTAGTC AGTATAACGATC AG ACCG ATGAGGC 1160 

• • • • 
1151 CAGCACCCAGACTTACGACTCCAAACGTAACGTTGGCGCA 1200 

1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

• • • • 

1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACCAACTTAA 1280 

• • • • 

1281 CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGGACC 1320 

, • • « 

1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 1360 

• ♦ ♦ • 

1361 TCAACA7GATCGATAGCAAGAAGATCACTCAACTTCCCTT 1400 

• • • • 

35 1401 GGTGAAAGCCTACAAGCTGCAATCTGGTGCTTCCGTTGTC 1440 

• • • • 

1441 GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA 1480 

• « • * 
1481 CAGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 1520 

• • • • 

1521 TGTGTCTTACTCTCAGAAGTACAGGGCACGTATTCATTAC 1560 

15 61 GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG 1 600 
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. * • • 

i601 GAGCACCCTTCAACCAGTATTACTTTGACAAGACCATCAA 1640 

• • • • 

1641 CAAAGGTGACACTCTCACATACAATAGCTTCAACTTGGCA 1680 

1681 AGTTTCAGCACACCATTTGAACTCTCAGGCAACAATCTTC 1720 

• ♦ • • 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA 1760 

17 61 CATCGACAAGATTGAGTTCATCCCAGTGAAC 17 91. 

1. A structural gene which encodes an insecticidal protein of B.t entomocidus having the sequence: 

• ■ * * 

1 ATGGAGGAGAACAACCAAAACCAATGCATTCCATACAACT 40 

41 GCTTGAGTAACCCAGAAGAGGTATTGCTTGA7GGAGAACG 80 
81 CATTTCAACCGGTAACTCTTCCATCGACATCTCCTTGTCC 120 

• * * * 

121 TTGGTCC AGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 1 60 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG 240 

241 C AGTTGATC AACG AGAGG ATCGCTG AGTTCGCCAGGAACG 280 

281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 
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• • • • 

321 C ATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 

361 AACAACCCAGAGACCCGCACTAGGGTGATCGACAGATTCA 400 

• • • • 

401 GAATCTTGGACGGCCTCTTGGAGAGAGATATCCCATCCTT 440 

• • • • 
441 CAGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 480 

• • * • 
481 GCTCAAGCAGCTAATCTTCACCTCGCTATCCTTCGAGACA 520 

521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACCACTATCAA 560 

• • * * 

5 61 CGTCAATGAGAATTACAACAGACTTATCAGGCACATTGAC 600 

« • • " 

25 601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 640 

641 T G AAC AAT CTCC C T AAGT C TACTTATCAAG ATT GG ATT AC 680 
681 CTACAACAGGT7GAGGAGAGACTTGACCCTCACAGTTTTG 720 

• • • • 

721 GACATTGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT 7 60 

• • • 

761 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 800 

801 TACTGACCCACTTATCAACTTCAACCCTCAGTTGCAAAGT 840 

. • • • 

841 GTCGCCC AACTTCCCACATTC AACGTCATGG AGTCCAGCC 880 

• • * 

881 GTATC AGGAACCCACACTTGTTTGACATCTTGAACAACCT 92 0 
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• • • • 

'•921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 960 

5 

• • ■ • 

961 TATTGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000 

• • * • 

io 1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA 1040 

• • • * • 

1041 CCAGGAGCCACCACGTAGTTTCACCTTCAACGGTCCAGTC 1080 

• • • • 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 
1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 1160 
1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

• • • • 

1201 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC 1240 

• • • • 
1241 CAGAGGACAATAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

• • • • 

1281 CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

• • • * 

35 1321 CCATTCCTCACTACAGGAGTTGTGTTCTCATGGACTGATC 1360 

• • • * • 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 1400 

• • • • 
1401 CAATCAAATCCCATTGGTC AAGGGTTTCCGTGTGTGGGGA 1440 

• • • • 
1441 GGAACTTCTGTCATCACAGGACCAGGCTTCACAGGAGGTG 1480 

• • • • 

1481 ATATTCTTAGAAG AAAC ACTTTTGGCGACTTTGTGAGCCT 1520 
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• • • • 

1521 CCAAGTTAACATCAACTCTCCAATTACTCAAAGATATCGT 

• • ■ • 
1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 

• • • • 

1601 TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 

• * • • 
1641 AGTCTCCGTGAAC ATGCCACTCCAGAAG ACT ATGGAGATC 

• • * * 
1 S 8 1 GGCGAGAACTTGACATCCAGGACCTTCAGATACACCGACT 

■ • * • 

1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT 
17 61 TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 

• • ♦ • 

1801 TCATCTGGCGAATTGTACATTGAC AAG ATTG AGATCATTC 

• • • • 

1841 TTGCCGACGCTACCT7CGAGGCTGAGTCTGACCTTGAGAG 

• • • • 
1881 AGCCCAG AAGGCTGTGAACGCCCTCTTTACCTCCTCTAAT 

■ ■ • * 

1 S 2 1 C AGATTGGCT7GAAAACTG ACGTTACTGACTATCACATTG 

1961 ACC AAGTGTCC AACTTGGTCGACTGCCTTAGCG ATGAGTT 

2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 

2041 CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 

20 8 1 ACCCCAACTTCAG AGGC ATCAACAGGCAGCCAGACCGTGG 
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• • • • 
2121 TTGGAGAGGAAGCACCGACATCACCATCCAAGGAGGCGAC 2160 

• • • • 

2161 GATGTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 2200 

• • • • 

2201 TGGACGAGTGCTACCCTACCTACTTGTACCAGAAGATCGA 2240 

• • * • 

2241 TGAGTCCAAACTCAAAGCCTACACCAGGTATGAACTTAGA 2280 

■ • • • 

2281 GGCTACA7CGAAGACAGCCAAGACCTTGAAATCTACCTCA 2320 

• • • • 

20 2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 23 60 

. • * 

23 61 TACTGGTTCCCTCTGGCCACTTTCTGCCCAAATGCCCATT 2400 

• • • • 
2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG 2440 

• • • ♦ 
2441 AGTGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 2480 

• • • • 

2481 GAAGTGTGCCCACCATTCTCATCACTTCACCTTGGACATC 2520 
2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 2560 

• • • • 

2561 GGGTC ATCTTC AAG ATC AAG ACCCAAGACGG ACACGC AAG 2600 
2601 ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 2640 

■ • • • 

45 2641 GGTGAAGCTCTCGCTCGTGTGAAGAGAGCAGAGAAGAAGT 2680 

• * • 

2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 2720 
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2721 CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 2760 

5 

27 61 GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 2800 

• • • 

io 2801 TCGCCATGATCCACGCTGCAGACAAACGTGTGCACAGGAT 2840 

• • • 

2841 TCGTGAGGCTTACTTGCCTGAG77GTCCGTGATCCCTGGT 2880 
2881 GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGTATCT 2920 

• • • • . 

2921 TT AC CGC A3 ACTCCTTGT AC G ATGCC AG AAACGTC AT C AA 2960 
2961 GAACGGTGACTTCAACAATGC-CCTCTTGTGCTGGAATGTG 3000 

• ♦ • 

3001 AAAGGTC ATGTGGACGTGG AGGAAC AGAACAATC ACCGTT 3040 

« • • • 

3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 3080 

3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 3120 

♦ 

as 3121 GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 3160 

3161 CC ATCC ACG AGATCG AGG AC AAC ACCGACGAGCTTAAGTT 3200 

• • • 

3201 CTCCAACTGCGTCGAGGAAGAAGTCTATCCC AACAAC ACC 3240 

3241 GTTACTTGCAACAACTAC ACTGGGACCCAGG AAG AGTACG 3280 

• • • * 

3281 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 3320 
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• • • • 
3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 

• ■ * * 

33 61 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTGAGA 

• * • • 

3401 ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 

• • • • 

3441 ACTTCCAGC AGG C TATGTTACC AAGGACCTTGAGTACTTT 

« • • • 

3481 CCTG AGACCGAC AAAGTGTGGATCGAGATCGGTGAAACCG 

• • • • 

3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTC7CTTGAT 

35 51 GGAGGAA 3567. 

/ 

J. A structural gene which encodes a P2 insecticidal protein having the sequence: 

• * • • 
1 ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT 

• • • • 

4 1 GCGACGCATACAACGTCGTGGCTCACGATCCATTCAGCTT 
8 1 CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG 

• ♦ • * 

12 1 GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 
161 TGGTTGGAACAGTGTCCAGCTTCCTTCTC AAGAAGGTCGG 

. t • • 

201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 
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« ■ • • 

" 241 ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA 280 

• • • • 

281 TCTTGAGGGAGACCGAACAGTTTCTCAACCAGCGTCTCAA 320 

• • • • 

321 CACTGATACCTTGGCTAGAGTCAACGCTGAGTTGATCGGT 360 

• * • • 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 400 

• • • ♦ 

401 ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 440 

• • • • 

so 4 CACTTCTTCCGTGAACACTATGCAGCAACTCTTCCTCAAC 480 

481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 520 

• • * • 

521 TTCTTCC ACTCTTTGCTC AGGCTGCCAACATGC ACTTGTC 5 60 

• • • • ♦ 
561 CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 600 

• • • • 
601 ATCTCTGCAGCCACTCTTAGGACATACAGAGACTACTTGA 640 

• • • 1 

641 GGAACTACACTCGTGATTACTCCAACTATTGCATCAACAC 680 
681 TTATCAGACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 720 

721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 7 60 

• • • 

45 761 TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGAG 800 

801 CTTGATGGTGTCCTCTGGAGCC AATCTCTACGCCTCTGGC 840 
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• • • • 
841 AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 

• • • • 

881 GGCCATTCTTGTATAGCTTGTTCCAAGTCAACTCCAACTA 920 

• • • • 

921 CATTCTCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 960 

• • • • 

961 TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 1000 

• # • • 

1001 ATAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 1040 

• • • • 

20 1041 CAGCTCTGGATTGATTGGTGCAACTAACTTGAACCACAAC 1080 

• • • " 

1081 TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 1120 

• • • • 

1121 TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 

♦ 

1161 AGTTGCTACCTCTACAAACTGGCAAACCGAGTCCTTCCAA 1200 

• • • * 
1201 ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 1240 

• * • 

1241 GGAATTCAAACTACTTTCC AGACT ACTTCATTAGGAACAT 1280 

* « 

1281 CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

• • • • 

1321 CGTCCACTTCATTACAACCAGATTAGGAACATCGAGTCTC 1360 

• • • • 

45 1361 CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

• • * • 

1401 T GTCC AT AAC AGG AAG AAC AAC ATCTACGCTGCCAACG AG 1440 
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. • • • 

1441 AATGGCACCATGATTC ACCTTGCACC AGAAGATTAC ACTG 1480 

• • • • 
1481 GATTCACCATCTCTCC AATCCATGCTACCC AAGTGAACAA 1520 

• • • • 
1521 TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA 1560 

• * * • 

1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 1600 

• • • 

1601 GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTTTA 1640 
1641 CTTGAGAGTTAGCTCCATTGGTAAC7CCACCATCCGTGTT 1680 
1681 ACCATCAACGGACGTGTTTACACAGTCTC7AATGTGAACA 1720 

• ■ • • 

1721 CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG 1760 

17 61 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 

1841 CTGGAACTCCATTTGATCTCATGAACATCATGTTTGTGCC 1880 

1881 AACTAACCTCCCTCCATTGTAC 1 902 J OT 

K. A structural gene sequence encoding a. fusion protein comprising the N-terminal 610 amino acids of 
HD-1 and the C-terminal 567 amino acids of B.tk. HD-73, said gene having the sequence: 
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• • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

5 .... 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • * 

8 1 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

10 

• • * * 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 1 60 

• • • • 

15 161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • * 

201 CTTTGGTCCATCTCAATGGGATGCAT7CCTGGTGCAAATT 240 
20 .... 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

. ... 

25 281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

. • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

30 .... 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • * 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

35 
40 
45 
50 
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• * • • 
'441 GTTCGCAGTCCAGAAC7ACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG S20 
521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • • • 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • * * 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • • • 

20 641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • « • 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • ■ * 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 
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7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 
801 CTATACT AACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 340 



841 CGTGGTTCTGCCC AAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 
881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• * • • 

45 961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • • 

100 1 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

50 



55 



86 



10 



15 



EP 0 385 962 B1 

« • • • 

1041 AC AACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

• • • • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 11 SO 

• • • • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

» * • • 

so 1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • * 

1281 CCAC AGGTTG AGC CACGTGTCC ATGTTCCGTTCCGGATTC 1320 

• * • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • * 
13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

« • • • 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACC AAGTCTACT 14 40 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

• • • • 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 

• • • • 

45 15 61 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

• « • • 

1601 ACTTGC AATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 
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• • * • 
1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

• • • « 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720 

• • • • 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 17 60 

• • • • 
17 61 CGCTCATGTGTTC AATTCTGGCAATGAAGTGTACATTGAC 1800 

• * 0 * 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCG AGGCTG 1840 

• • • • 

20 1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 1880 

1881 CTTTACCTCCACC AATCAGCTTGGCTTGAAAACTAACGTT 1920 

• • ■ • 
1921 ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT 1960 

• • • • 

1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 2000 

• • • • 

2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

• * • * 

2041 AGGAATCTCTTG CAAGACTCCAACTTCAAAGACATC AACA 2080 

• • • • 

2081 GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 2120 
2121 CATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTACGTC 2160 

• * • • 

45 2161 ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 2200 

» • • • 

2201 TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTTCAC 2240 
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10 



15 



20 



25 



2241 CAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 

• • • • 
2281 CTTGAAATCTACTCGATCAGGTACAATGCCAAGCACGAGA 2320 

« • • • 

2321 CCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACTTTC 2360 

• • • • 

2361 TGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAACAGA 2400 

• • • • 

2401 TGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACTGCT 2440 

■ • • • 

24 41 CCTGC AGGGATGGCG AGAAGTGTGCCCACC ATTCTCATCA 2480 

• • • • 
2481 CTTCTCCTTGGACATCGATGTGGGATGTACTGACCTGAAT 2 S20 

2521 G AGGACCTCGG AGTCTGGGTC AT CTTCAAG ATCAAGACCC 2560 

• * • • 

so 25 61 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2 600 

2601 AG AGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG AAG 2640 

2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2 680 

• • • • 
2681 AATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGAGTC 2720 

• • • • 

2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

• • • • 
2761 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800 

• • • • 

2801 AACG7GTGCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 
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• • • ♦ 
2841 GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA 2880 

• • • • 
2881 CrcGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 2920 

2921 CCAGAAACGTCATCAAGAACGGTGACTTCAACAATGGCCT 2960 

« • • • 

2 961 CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 3000 

« • • • 

3001 C AGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 3040 

« • • • 

20 3041 AAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGGTAG 3080 

• • * • 

3081 AGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGATAC 3120 

• • » • 

3121 GGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACAACA 3160 

• • • • 
3161 CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAAT 3200 

• • • • 

3201 CTATCCCAACAACACCGTTACTTGCAACGACTACACTGTG 3240 

• • • • 

3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCG7AACA 3280 

. • • • 

3281 GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 3320 

• • * • 

3321 CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 3360 

• • • • 

3361 GAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACTACA 3400 

m • • • 

3401 CACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGAGTA 3440 
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3441 



CTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAA 



3480 



3481 



ACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCT 



3520 



3521 



TGATGGAGGAA 3531. 



Patentanspriiche 

1. Verfahren zur Modifizierung einer Wildtyp-Struktur-Gensequenz, welche fur ein insektizides Protein von Bacillus 
thuringiensis codiert, zur Verbesserung der Expression dieses Proteins in Pflanzen, welches umfasst: 

a) das Identif izieren von Regionen innerhalb dieser Sequenz mit mehr als vier aufeinander folgenden Adenin- 
oder Thymin-Nukleotiden; 

b) das Modif izieren der Regionen von Schritt (a), die zwei Oder mehr Poiyadenylierungssignale innerhaib einer 
Zehn-Basen-Sequenz aufweisen, um diese Signale zu entfernen, wobei eine Gensequenz, die fur dieses 
Protein codiert, beibehaiten wird; und 

c) das Modif izieren der 15-30-Basen-Regionen, die die Regionen von Schritt (a) umgeben, um Pflanzen-Po- 
lyadenylierungs-Hauptsignale, aufeinander folgende Sequenzen, die mehr als ein untergeordnetes Polyade- 
nylierungssignal enthalten, und aufeinander folgende Sequenzen, die mehr als eine ATTTA-Sequenz enthal- 
ten, zu entfernen, wobei eine Gensequenz, die fur dieses Protein codiert, beibehaiten wird. 

2. Verfahren zur Modifizierung einer Wildtyp-Struktur-Gensequenz, welche fur ein insektizides Protein von Bacillus 
thuringiensis codiert, zur Verbesserung der Expression dieses Proteins in Pflanzen, welches umfasst: 

a) das Entfernen von Polyadenylierungssignalen, die in diesem Wildtyp-Gen enthalten sind, wobei eine Se- 
quenz, die fur dieses Protein codiert, beibehaiten wird; und 

b) das Entfernen von ATTTA-Sequenzen, die in diesem Wildtyp-Gen enthalten sind, wobei eine Sequenz, die 
fur dieses Protein codiert, beibehaiten wird. 

3. Verfahren nach Anspruch 2, welches weiters das Entfernen von selbstkomplementaren Sequenzen und das Er- 
setzen solcher Sequenzen durch nicht-selbstkomplementare DNA, welche von Pflanzen bevorzugte Codons auf- 
weist, wobei eine Struktur-Gensequenz, die fur dieses Protein codiert, beibehaiten wird. 

4. Verfahren nach den Anspruchen 1 bis 3, welches weiters die Verwendung von von Pflanzen bevorzugten Sequen- 
zen beim Entfernen der Poiyadenylierungssignale und ATTTA-Sequenzen umfasst. 

5. Verfahren nach den Anspruchen 1 bis 3, bei welchem die Pflanzen-Polyadenylierungssignale ausgewahlt sind aus 
der Gruppe bestehend aus AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAG- 

' CAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA und CATAAA. 

6. Verfahren zur Verbesserung der Expression eines heterologen Gens in Pflanzen, wobei dieses Gen ein modifi- 
ziertes chimares Gen aufweist, das einen Promotor enthalt, der in Pflanzenzellen wirkt, der operabel mit einer 
strukturellen Codiersequenz und einer 3'-nicht-translatierien Region, die ein Polyadenylierungssigna! enthalt, das 
in Pflanzen wirkt, um di Addition von Polyadenylat-Nukl otiden an das 3'-Ende d r RNA zu b wirk n, verbunden 
ist, wobei die strukturelle Codi rs quenzfur in insektizides Protein codiert, von welch m mindestens ein Teil von 
einem Bacillus-thuringiensls-P totem stammte, wobei das Verfahr n das Modifizi ren dieser strukturellen Codier- 
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sequenz umfasst, so dass diese Sequenz eine DNA-Sequenz aufweist, die sich von der naturiicherweise vorkom- 
menden DNA-Sequenz, welche fur dieses Bacillus-thuringiensis-Prote'm codiert, unterscheidet und diese struktu- 
relle Codiersequenz nicht mehr ais 5 aufeinander folgende Nukleotide aufweist, die entweder aus Adenin- oder 
aus Thymin-Resten bestehen. 

5 

7. Verfahren zur Verbesserung der Expression ines heteroiogen Gens in Pflanzen, wobei dieses Gen ein modifi- 
ziertes chimares Gen aufweist, das einen Promotor enthalt, der in Pflanzenzellen wirkt, der operabel mit einer 
strukturellen Codiersequenz und einer 3'-nicht-translatierten Region, die ein Polyadenyiierungssignal enthalt, das 
in Pflanzen wirkt, urn die Addition von Polyadenylat-Nukleotiden an das 3'-Ende der RNA zu bewirken, verbunden 
10 ist, wobei diese strukturelte Codiersequenz fur ein insektizides Protein codiert, von welchem mindestens ein Teil 

von einem BacilluS'thuringiensis-Proieln stammte, wobei das Verfahren das Modifizieren dieser strukturellen Co- 
diersequenz umfasst, so dass diese Sequenz eine DNA-Sequenz besitzt, die sich von der naturiicherweise vor- 
kommenden DNA-Sequenz, die fur das Bacillus-thuringiensis-ProXe'm codiert, unterscheidet und die folgenden 
Merkmale hat: 

15 diese strukturelle Codiersequenz hat eine Region, die zur folgenden Sequenz komplementar ist: 



GGCTTGATTCCTAGCGAACTCTTCGATTCTCTGGTTGATGAGCTGTTC 
20 1 5 10 15 20 25 30 35 40 45 

wobei in der Codiersequenz dieser Region 2 AACCAA- und 1 AATTAA-Sequenz eiiminiert sind. 

25 8. Verfahren nach Anspruch 7, wobei die strukturelle Codiersequenz fur ein insektizides Protein codiert, von welchem 
mindestens ein Teil von einem Bacillus thuringiensis kurstakis HD-1 stammte. 

9. Verfahren nach Anspruch 7 oder 8, wobei die Pflanze eine Tabakpfianze ist. 

30 10. Modifiziertes chimares Gen, das einen Promotor enthalt, welcher in Pflanzenzellen wirkt, der operabel mit einer 
strukturellen Codiersequenz und einer 3'-nicht-translatierten Region, die ein Polyadenyiierungssignal enthalt, wel- 
ches in Pflanzen wirkt, urn die Addition von Polyadenylat-Nukleotiden am 3'-Ende der RNA zu bewirken, verbunden 
ist, wobei diese strukturelle Codiersequenz fur ein insektizides Protein codiert, von welchem mindestens ein Teil 
von einem Bacillus thuringiensis-ProXe\n stammt, wobei diese strukturelle Codiersequenz eine DNA-Sequenz auf- 

35 weist, die sich von der naturiicherweise vorkommenden DNA-Sequenz, welche fur dieses Bacillus thuringiensis- 

Protein codiert, unterscheidet und ausgewahlt ist aus: 

A. einem Struktur-Gen, welches fur ein insektizides Protein von B.tk. HD-1 codiert, mit der Sequenz: 
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1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 40 
5 • • « • 

4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 8 0 

• • • • 

, 0 81 TGCTGGATTTGTGTTAGGACTAGTTGATATTATCTGGGGA 120 

« • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 

• • • • 

" 161 TTGAACAGCTCATCAACCAGAGAATCGAAGAGTTCGCTAG 200 

• • • • 

201 GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 

241 TATCAAArTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 

• • « • 

25 281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

• • • • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 

• • • • 

3D 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTCCTCTCCG 400 

• • * • 

401 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 440 
35 .... 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480 

• • • • 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 

40 

• • • • 

' 521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 560 

45 • * • • 

561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

• • • • 

601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 640 

50 

• • • . • 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

• • • • 

55 681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 



10 



15 



20 



25 



EP 0 385 962 B1 

* • • 

721 ATTTATACAAACCCAGTATTAG AAAATTTTGATGGTAGTT 7 60 
. . • • 

761 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

> 

• • • • 

801 TCCACATTTG ATGGATATACTTAATAGTATAACC ATCTAT 840 

• « • • 
841 ACGGATGCTCAT AGAGGAGAATACT ACTGGTCCGGTCACC 880 

• • • « 
881 AGATCATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 92 0 

• « • • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

• • ♦ • 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

• • • * 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT 1040 

1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 

30 1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

• • • ♦ 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• * * * 
1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

• • • • 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTT CGTTC AGGCT 1240 

• « * • 
1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

• • • • 
1281 CTCTTGG ATACATCGTAGTGCTG AGTTC AACAACATCATC 1320 

• • • • 
1321 CCTTCATCACAAATCACCCAAATCCCACTC ACCAAGTCTA 1360 

« ♦ ■ • 

13 61 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

• • • • 

^ 1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 
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1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

1521 AAACCTTCAGTTCCACACATCAATTGACGGAAGACCTATT 1560 

1561 AATCAGGGGAATTTrTCAGCAACTATGAGTAGTGGGAGTA 1600 

1601 ATTIACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 
1721 ATCGAATTGAATTTGTTCCGGCA 1743^. 

B, einem Struktur-Gen, welches fur ein insektizides Protein von B.tk. HD-73 codiert, mit der Sequenz: 

• • * * 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 4 0 

4 1 TGTCCTTGACAC AGTTTCTGCTCAGCGAGTTCGTGCCAGG 8 0 

« • ♦ ♦ 

8 1 TGCTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGT 12 0 

• • • • 

12 1 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 1 60 

• • • • 

161 TTGAGCAGTTGATCAACC AGAGG ATCGAAGAGTTCGCCAG 200 

• • • • 

20 1 GAACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTC 240 

241 TACCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG 280 

• • • • 

281 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 
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* . • * 

321 ATTCAACGACATG AACAGCGCCTTG ACCAC AGCTATCCCA 3 60 

• • • • 
361 TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG 400 

401 TGTACGTTCAAGC AGCTAATCTTCACCTCAGCGTGCTTCG 440 

• • • * 

441 AGACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCT 480 

• • ♦ t * 
481 GCAACCATCAATAGCCGTTACAACGACCTTACTAGGCTGA 520 

• • • • 
521 TTGGAAACTACACCGACCACGCTGTTCGTTGGTACAACAC 560 

• • • • 
561 TGGCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGG 600 

• • ■ • 
60 1 ATTAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAG 640 

• • • • 
64 1 TTTTGGACATTGTGTCTCTCTTCCCGAACTAXGACTCCAG 680 

« • • • 

68 1 AACCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAA 720 

721 ATCTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCT 760 

• • • • 

35 7 61 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 800 

• • • • 

801 CCCACACTTG ATGGACATCTTGAACAGCATAACTATCTAC 840 

• • • • 
841 ACCGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACC 880 

« • • • 

881 AGATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTT 920 

• • • • 
921 TACCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCA 960 

• • • • 
961 C AACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 1000 

• • • • 
1001 G AACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATAT 1040 

1 0 4 1 CGGTATC AACAAC CAGC AACTTTCCGTTCTTG ACGG AACA 1080 
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1081 GAGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 1120 

• • 9 m 

1121 TTTACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAAT 1160 

• • • • 

1161 CCCACCACAGAACAACAATGTGCCACCCAGGCAAGGATTC 1200 

• • • • 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 124 0 

• • • • 

1241 TCAGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTT 1280 

• • • • 

1281 CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 132 0 

« • • • 

1321 GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 1360 

• • • • 

13 61 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATT 1400 

• * • • 

1401 CACTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAAT 1440 

• • • • 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 

• # • * 

1481 TCCCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTA 1520 

• * ♦ • 

1521 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 1560 

1561 AATTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTA 1600 

• • • • 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 

• • • • 

1641 TGAAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATC ' 1680 

• * « ♦ 

1681 GTGGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTA 17 20 
1721 TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 17 60 
1761 GGCTGAG 1767. 

C. einem Struktur-Gen, das fur in insektizides Protein von B.tk. HD-1 codi rt, mit der S quenz: 
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• • « • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • • • 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • ♦ 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • • • 

121 TCCTTGACACAGTTTCTGCTCAGCG AGTTCGTGCCAGGTG 1 60 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • • 

201 CTTTGGTCCATCTCAATGGGAIGC ATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCAACCAG AGGATCG AAGAGTTCGCCAGG A 280 

• • • « 

281 ACCAGGCCATCTCTAGGTTGG AAGG ATTGAGCAATCTCTA 320 

• • * 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 



' 361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • * 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

« • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 

40 481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

• • • • 
561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • • • 
601 GGAAACTACACCG ACCACGCTGTTCGTTGGT ACAACACTG 64 0 

• • • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 
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• • • • 
681 TAGATACAACC AGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 
721 . TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 . 

. • • * 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • ♦ • 

801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGT^GCTPC 840 

• • • ♦ 

is 841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

. • « • 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 9 60 

• • * • 
* 961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • • 
1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 
a • • * • 

35 1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • ■ 

11 61 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • ■ 
1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 
1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCAC AGGTTGAGCCACGTGTCCATGTXCCGTTCCGGATTC 13 2 0 

• • ♦ • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • • • 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • • • 

55 1401 TTCCTCTCAAATCACCCAAATCCCATTGACCAAGTCTACT 1440 
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• • « • 

1441 AACCTTGG ATCTGGAACTT CTGTCGTGAAAGGACCAGGCT 1480 

• • • « 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 1520 

• • » * 

1521 GATTAGC ACCCTC AGAGTTAACATCACTGCACC ACTTTCT 1560 

• « • « 

1561 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

• « « * 

1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

• • • • 

1 681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720 

• • • • 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 1760 

• • • • 

17 61 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

• , • • • 

1801 CGTATTGAGTTTGTGCCTGCCG AAGTTACCTTCGAGGCTG 1840 



1841 AGTAC 1845. 

f 



D. einem Struktur-Gen, das fur ein insektizides Protein codiert, das von B.t.k. HD-73 stammt, mit der Sequenz: 

» • • • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

4 1 ACTGCTTGAGTAACCCAG AAGTTGAAGT ACTTGGTGG AGA 8 0 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATC ATCTGGGGTAT 200 
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* • • 

* 201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGT6CAAATT 240 

• • • • 

241 GAGCAGTTGATCAACCAGAGGAICGAAGAGTTCGCCAGGA 280 

• ■ • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAAICTCTA 320 

• * • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

» 

• • • • • 

« 361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • • « 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • * • 
481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • • * 
521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 5 60 

• • • • 
561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• ♦ * * 

as 641 • GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• I 9 • 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 
721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• * « • 
* 801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 
841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

881 CACACTTGAXGGACATCTTGAACAGCATAACTATCTACAC . 920 

• • « • 

55 921 C GATG CT C AC AGAGG AG AGTATT ACTGGTCTGG ACACCAG 960 
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. * * • 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 
. • • • 

100 1 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• * • • 
1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

i 

• • » • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

• « • • 

is !X21 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• * * ♦ 

11 61 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • ♦ 
1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGG ACGAAATCC 1240 

• ♦ • • 
1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

128 1 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 60 

• • • • * 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

* • • 

55 1401 ATCCGAT AGTATT ACTCAAATC CCT G C AGTGAAGGGAAAC 1440 

14 41 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

« • • * 

1521 C ATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

1561 CCATCCACATCTACCAG AT ATAGAGTTCGTGTGAGGT ATG 1600 

1 601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1 64 0 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

55 • 1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 
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1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1920 

1921 G 1921. 



E. einem Struktur-Gen, das fur das insektizide Protein von B.t.k. HD-73 in dessen gesamter Lange codiert, 
mit der Sequenz: 



1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 
41 ACTGC2TGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • « • 

121 TCCrTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• * * •. 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• ♦ • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGfcCAGGA 2 80 

« ' • • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• ♦ • • 

321 CC AAAT CT ATGC AG AGAG CTTC AGAG AGTGG G AAGCCGAT 360 

• • « • 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 
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• • • • 

441 GTTCGCAGTCC AGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 
481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 60 0 
« ♦ • • * 

15 601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

« * • ♦ 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • • • 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • * 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 
761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • • • 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

• • • • 

841 CGTGGTTCTGCCC AAGGT ATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

35 881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 

921 CGATGCTC AC AG AGGAG AGT ATT ACTGGTCTGGAC ACCAG 960 

• • • • 
961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• * • • 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 
1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 112 0 

• • • • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

55 ii6i GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 
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• « • • 
' 1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1^40 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• * • • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 132 0 

• • • * 

1321 AGCAAC AGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 6 0 

• . • < 

15 1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

, • • « 

1401 ATCCG AT AGT ATTACTC AAATCCCTG C AGTGAAGG G AAAC 1440 

• • • • 
1441 TTTCTCTTCAACGGTTCTGTCATTTCAGG ACCAGGATTC A 1480 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • 
1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• « • * 
1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

« # • • 

1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1 64 0 

• • • • 

35 x641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

• • • • 

1 681 TCCTTGGATAATCTCCAATCCAGCGATTrCGGTTACTTTG 1720 

• • • * 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

• • • • 
17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• 1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

. * • 

1841 CTGAAT ATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880; 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCAAT 1920 

53 !921 GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA I960 
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• * • » 

1961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAAAGCGAGA 2000 

• • • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 

• • • • 

2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 

• • * • 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 

« * • p 

2121 TACCAT CCAGGG AGGTG ACGAC GTGTTCAAG GAGAACTAC 2160 

• • • • 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• • • • 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 2240 

• • • • 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 

• • • • 

2281 GACCTCGAGATCTACCTCATCCGCTACAATGCAAAACATG 2320 

• • * • 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 23 60 

2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• ♦ • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAG ATT 244 0 

• • * • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2520 

• « • • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 



2561 CGCAAG ATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • • »• 

so 2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

• • ■ • 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 2680 

■ • # • 

2681 TGGAATGGGAGACCAACATCGTCTACAAAGAGGCAAAAGA 2720 
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• • • « 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

• • • • 
27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTG A 2840 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • . • 

15 2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 2920 

• • • • 

2921 ATCCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

2961 CTTATCCTGCTGG AACGTG AAAGGGCATGTAGATGTAG AA 3000 
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* 3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

• • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • ■ • 

30 3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• # » • 

3121 TATGGAGAAGGTXGCGTAACCATTCATGAGATCGAGAACA 3160 

• m • 9 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

• V • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

• * • • 
3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

• * • • 
3281 ATCGAGGATATAACGAAGCTCpTTCCGTACCAGCTGATTA 3320 

• • ■ • 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

• • « • 

so 3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

• * • • 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 344 0 

• • • • 
3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 
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3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 



F. einem Struktur-Gen, das fiir ein insektizides Protein von BAM. HD-73 in dessen gesamter Lange codiert, 
mit der Sequenz: 



1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

• • • * 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • « « 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• * * « 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• • « * 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• ♦ • * 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 

• • * • 

321 CCAAATCTATGCAG AGAGCTTC AGAGAGTGGGAAGCCG AT 360 

361 CCTACTAACCCAGCTCTCCGCG AGG AAATGCGTATTCAAT 400 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCAC AGCTATCCCATT 440 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • « 

481 TACGTTC AAGCAGCTAATCTTC ACCTCAGCGTGCTTCGAG 520 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 
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• • • • 

561 AACCATCAATAGCCGTT ACAACG ACCTTACT AGGCTQATT 600 - 

« • • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 64 0 

• • • • 
64 1 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 

• • • • 

681 TAGATAC AACC AGTTC AGGAGAGAATTGACCCTCACAGTT 720 

• • • • 

15 721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 

761 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• * * « 
801 CTATACTAACCCAGTTCTTG AGAACTTCG ACGGTAGCTTC 840 

• • • • 
841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 
881 CACACTTGATGG AC ATCTTG AACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • * 

961 ATCATGGCCTCTCCAGTTGG ATTCAGCGGGCCCG AGTTTA 1000 

• • • • 

35 1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

104 1 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATC AAC AACC AGC AACTTTCCGTT CTT GACGGAAC AG A 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTCTCCATGTTCCGTTCCGGATTC 132 0 
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• • * • 

1321 AGC7UVCAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 60 

• • • • 
1361 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

■ • • • 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• a • • 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1480 

• • « • 

» 1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• * • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • * a 

1561 CCATCCACATCTACCAGATATAG AGTTCGTGTGAGGTATG 1 60 0 

• • • • 
1601 CTT CTG TG ACCC CTATTCACCTC AACGTT AATT GGGGTAA 1640 

« * • • 

1 64 1 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1680 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 
* 1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

• • • • 

55 1 7 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• • • • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 184 0 

• • • * 
1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 

• « • • 
1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

• • • • 
1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 

• • ♦ • 
1961 CGTATTTATCGGATGAATTTT GTCTGGATGAAAAGCG AGA 2000 

• . • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
« • • . 

ss 2041 GAACGCAATTTACTCCAAGATTCAAATTTCAAAGACATTA 2080 
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• * « • 

2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2 12 0 

• • • • 
2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 

• • « • 
2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 

• • • • 

2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

• • • • 

' 5 2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 . 

• • • • 

2281 GACTTAGAAATCTATTTAATTCG CTACAATGCAAAAC ATG 2320 

• • • • 

2321 AAACAG TAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• # • • 
2361 TTCAGCCC AAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 

• • • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

• • • • 

30 2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 2 5 2*0 

■ • • • 
2521 AATGAGGACCTAGGTGTATGGGTGATCTrTTAAGATTAAGA 2560 

• • • • 
2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

• • ♦ • 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

• < * • 
2 641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

• • • • 
2 681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 

• • • • 
2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

■ • • • 
27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

• • • • 
2801 ATAAACGTGTTCATAGC ATTCGAGAAGCTTATCTGCCTGA 2840 
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15 



20 



2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 
2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 



,o *2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

• • * • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• • • # 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

« « • • 

304 1 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • * * 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• • • • 

25 3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• * • * 

3161 ATACAGACGAACTGAAGTTTAGC AACTGCGTAGAAGAGGA 3200 

• • * • 
3201 AATCTATCC AAATAACAC GGTAACGTGTAATGATTATACT 3240 

• • * • 
3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

• • • • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 
» • • • 

40 3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

• • • • 

33 61 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

• • • • 
3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 

• • • « 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

• • • • 

'3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

3521 TCCTTATGGAGGAA 3534. 

i 
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G. einem Struktur-Gen, das fur ein insektizides Protein von BA.k. HD-73 in dessen gesamter Lange codiert, 
mit der Sequenz: 

1 ATGGACAACAACCC AAACATCAACGAATGCATTCCATACA 4 0 

• • • • 

4 1 ACTGCTTG AGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 8 0 

• • ♦ • 

8 1 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • • • 

12 1 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 1 60 

• • * • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

« • • • 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

• • * « 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 . 

281 ACCAGGCCATCTCTAGGTTGGAAGG ATTGAGCAATCTCTA 320- 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 

• • • • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

, • • « 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • « • 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCrXCGAG 520 

• • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 
601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• * • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 680 
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681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 60 

• • • • 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

« • • 

801 CTAXACTAACCC AGTTCTTGAG AACTTCG ACGGTAGCTTC 840 

• • • * 

15 841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

• • • • 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • • * 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• * * 
1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 



20 



25 
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30 1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

* • * * 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

* • • • 
1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 
12 0 1 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

12 4 1 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCXC 1280 

• • « • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 » 

♦ • • • 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCATCGC 1400 

* • • • 

55 1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
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1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA .1480 

5 • * • • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• • • • 

1521 C ATTC AGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

10 

• • • • 

1561 CCATCCACATCTACCAGATATAGAGTTCGTCTGAGGTATG 1600 

• • • • 

is 1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

• ► « « 

i 6 4 1 TTCATCC ATCTTCTCC AATACAGTTCCAGCTACAGCTACC 1680 
« « • * 

20 

1681 TCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACXCGGTAACAXCGT 1760 

25 .... 

17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 

• * • • 

1801 GACAGATTCGAGTTCATTCCAGrTACTGCAACACTCGAGG 1840 

30 

• • • • 

1841 CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC " 1880 

• • • • 

35 1881 CCTCTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAAC 1920 

• • • * 

1921 GTTACTGACTATCACATTGACCAAGTGTCCAACTTGGTCA 1960 

• • • a 

40 

1961 CCTACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGA 2000 

• • • • 

2001 ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 2040 

45 .... 

2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 2080 

• • • • 

2081 ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAT 2120 

50 

• * • • 

2121 C AC CATCC AAGGAGGCGACGATGTGTTCAAGGAGAACTAC 2 1 60 

• • • • 

55 2161 .GTCACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCT 2200 
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♦ 

2201 ACTTGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTT 2240 

• • • • 

'2241 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAA 2280 

• ■ • • 
2281 GACCTTGAAATCTACTCGATCAGGTACAATGCCAAGCACG 2320 

« • • • 

2321 AGACCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCCACT 2360 

• • * • 

r5 2361 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 

• . « • 

2401 AGATGCGCTCCACACCTTGAGTGGAATCCTG ACTTGGACT 2440 

• • • • 
24 41 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCA 2480 

• • • • 
2481 TCACTTCTCCTTGGACATCGATGTGGGATGTACTGACCTG 2520 

• • • • 
2521 AATGAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGA 2560 

• » • • 
2561 CCCAAGACGGACACGCAAG ACTTGGC AACCTTGAGTTTCT 2 60 0 

• • • • 

2601 CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 2 64 0 

35 2641 AAGAGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAAC 2680 

• • ♦ • 

2681 TCGAATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGA 2720 

• • • • 
2721 GTCCGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAG 2760 

• • # 

2761 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 2800 

• • • • 
2801 ACAAACGTGTGCACAGCATTCGTGAGGCTTACTTGCCTGA 2840 



20 



25 



30 



40 



45 



50 



2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAG 2880 

• • » ♦ 

2881 GAACTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACG 2920 

• • • • 

55 2921 ATGCCAGA3VACGTCATCAAGAACGGTGACTTCAACAATGG 2960 
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• « • • 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAG 3000 

• • • • 

3001 G AAC AGAACAATC AGCGTTCCGTCCTGGTTGTGCCTGAGT 3040 

. • • • 

3041 GGGAAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCCAGG 3080 

* • 

3081 TAGAGGCTAC ATTCTCCGTGTGACCGCTTACAAGGAGGGA 3 120 

• • ♦ • • 

3121 TACGGTGAGGGTTGCGTGACCATCCACGAGATCGAGAACA 3160 

3161 ACACCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGA 3200 
3201 AATCTATCCCAAC AACACCGTTACTTGCAACGACTACACT 3240 
3241 GTGAATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTA 3280 

' 3281 ACAGAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTA 3320 

» • • • 

3321 TGCCTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGA 3360 

3361 CGTGAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACT 3400 

3401 ACACACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGA 3440 

- • • • 

34 4 1 GTACTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

. • • • 

3481 GAAACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTC 3520 
3521 TCTTGATGGAGGAA 3534. 

H. einem Struktur-Gen, das fur ein insektizides Protein von B.t.t. codiert, mit der Sequenz: 

1 ATGACTGCAGACAACAACACCGAAGCCCTCGACAGTTCTA 40 
4 1 CCACTAAGGATGTTATCCAGAAGGGTATCTCCGTTGTGGG 80 
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• • • • 

Bl AGACCTCTTGGGCGTGGiTTGGATTTCCCTTCGGTGGAGCC 120 

• • • • 

121 CTCGTGAGCTTCTATACAAACTTTCTCAACACCATTTGGC 1 60 

• * * • 

161 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA 200 

• • • * 

201 AGCTCTT ATGGATCAGAAGATTGC AG ATTATGCCAAGAAC 240 

• • • • 

241 AAGGCTTTGGCAGAACTCCAGGGCCTTCAGAACAATGTGG 280 

• • • • 

281 AGGACTACGTGAGTGCATTGTCCAGC1GGCAGAAGAACCC 320 

• • • • 

321 TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAGA 360 

• • • « 

361 GAGTIGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 400 

• * • • 

401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 440 

• • • * 

441 CACTACCTATGCTCAAGCTGCCAACACCCACTTGTTTCTC 480 

481 CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACG 52 0 

• - * • 

521 AGAAAGAGGACATTGCTGAGTTCTACAAGCGTCAACTTAA 560 

• • • . 

5 61 GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 600 

« * # 

601 AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 640 

• • • • 

641 CTTGGGTGAACTTCAACAGATACAGGAGAGAGATGACCTT 680 

• • » . 

681 GACTGTGCTCGATCTTATC G C ACTCTTTCCCTTGTAC GAT 720 

• • * ■ 

721 GTGAGACTCTACCCAAAGGAAGTGAAAACTGAGCTTACCA 7 60 

• • • . 

7 61 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 800 

• • • * 

801 TAGGGGTTATGGAACTACCTTCAGC AATATCGAAAACTAC 840 
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841 ATTAGGAAACCACATCTCTTCGACTATCTECACAGAATTC 880 

881 AATTCCACACAAGGTTTCAACCAGGATACTATGGTAACGA 920 

921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 960 

961 CCAAGCATTGGATCTAATGACATCATCACATCTCCCTTCT 1000 

• • * # 

1001 ATGGTAACAAGTCCAGTGAACCTGTGCAGAACCTTGAGTT 1040 

• - • ♦ 

1041 CAACGGCGAGAAAGTCTATAGAGCCGTCGCAAACACCAAT 1080 

• • • • 

1081 CTCGCTGTGTGGCCAXCCGCAGTTTACTCAGGCGTCACAA 112 0 

• • • • 

1121 AGGTGGAGTTTAGTCAGTATAACGATCAGACCGATGAGGC 1160 

• • • . * 

1161 CAGCACCCAGACTTACGACTCCAAACGTAACGTTGGCGCA 1200 

• • • • 

1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

• * * • 

1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACCAACTTAA 12 80 

» 

• • • • 

1281 CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGG ACC 1320 

• ♦ • ♦ 

1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 13 60 

• • • • 

1361 TCAACATGATCGATAGCAAGAAGATCACTCAACTTCCCTT 1400 

• • • • 

1401 GGTGAAAGCCTACAAGCTGCAATCTGGTGCTTCCGTTGTC 1440 

• . • * 

14 41 GCAGGTCCCAGATTCACTGGAGGTGACATCATCCAGTGCA 1480 

• • • • 

1481 CAGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 1520' 

, • • • 

1521 TGTGTCTTACTCTCAGAAGTACAGGGCACGTATTCATTAC 1560 

• • • 

1561 GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG 1600 
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i 60 1 GAGCACCCTTCAACCAGTATTACTTTGACAAGACCATCAA 1640 

* 

1641 CAAAGGTGACACTCTCACATACAATAGCTTCAACTTGGCA 1680 

1681 AGTTTCAGCAC AC CATTTGAACTCT C AGGCAACAATCTTC 1720 

• • • 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAGACAAAGTCTA 17 60 

• • • 

17 61 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791. 

I 

I. einem Struktur-Gen, das fur ein insektizides Protein von B. t. entomocidus codiert, mit der Sequel 



1 ATGGAGGAGAACAACCAAAACCAATGCATTCCATACAACT 40 

• • • • 

41 GCTTGAGTAACCCAGAAGAGGTATTGCTTGATGGAGAACG 80 . 

• • • • 

8 1 CATTTCAACCGGTAACTCTTCCATCGACATCTCCTTGTCC 120 
« • • • 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 1 60 

• • « • 

161 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

• « • • 

201 TGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATTGAG 240 

• • • • 

241 C AGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 

281 CTGCCATCGCTAACTTGGAAGGATTGGGCAATAACTTCAA 320 

• • * 

' 321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 

• • • • 

361 AACAACCCAGAGACCCGCACTAGGGTGATCGACAGATXCA 400 • 

« • • • 

401 GAATCTTGGACGGCCTCTTGGAGAGAGATATCCCATCCTT 440 

• « * • 

441 CAGAATCTCTGGCTTCGAAGTTCCTCTCTTGTCCGTGTAC 480 
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481 GCTCAAGCAGCTAATCTTCACCTCGCTATCCTTCGAGACA 520 

521 GTGTCATCTTTGGGGAAAGGTGGGGATTGACCACTATCAA 5 60 

5 61 CGTCAATGAGAATTACAACAGACTTATCAGGCACATTGAC 600 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 640 

641 TGAACAATCTCCCTAAGTCTACTTATCAAGATTGGATTAC 680 

• • • * 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG "720 

721 GACATTGCAGCTTTCTTCCCGAACTATGACAACAGGAGAT 7 60 

7 61 ACCCTATCCAACCAGTGGGTCAACTTACCAGAGAAGTCTA 800 

801 TACTGACCCACTTATCAACTTCAACCCTCAGTTGCAAAGT 840 

841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC 880 

881 GTATCAGGAACCCACACTTGTTTGACATCTTGAACAACCT 920 

"'921 TACTATCTTCACCGATTGGTTCAGCGTTGGGCGTAACTTC 9 60 

♦ 

9 61 TATTGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000 

• • • • 

1001 GTGGGAACATTACCTCTCCTATCTATGGACGTGAGGCAAA 1040 

1041 CCAGGAGCCACCACGTAGTTTCACCTTCAACGGTCCAGTC 1080 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 

• • • • 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 11 60 

1161 GGGCGTTGAGTTCTCTACTCCTACCAACTCCTTCACTTAC 1200 

12 01 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC 1240 
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• • • * 

12 4 1 CAGAGGACAATAGCGTGCCACCCAGGGAAGGCTACTCCCA 1280 

• • • * 
1281 CAGGTTGTGCCACGCAACCTTCGTGCAGCGTTCCGGAACT 1320 

• • • • 
1321 CCATTCCTCACTACAGGAGXTGTGTTCTCATGGACTGATC 1360 

• • • • 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT . 1400 

,5 1401 CAATCAAATCCCATTGGTCAAGGGTTTCCGTGTGTGGGGA 1440 

1441 GGAACTTCTGTCATCACAGGACCAGGCTTCACAGGAGGTG 1480 

• * • • 
1481 ATATTCTTAGAAGAAACACTTTTGGCGACTTTGTGAGCCT 1520 

a » • • 

1521 CCAAGTTAACATCAACTCrrCCAATTACTCAAAGATATCGT 1560 

• • • • 

1561 CTCAGGTTTCGTTACGCATCTTCCCGTGACGCTAGAGTCA 1600 

• • • • 

3 o 1601 TCGTGCTCACCGGAGCAGCTTCTACCGGTGTCGGTGGACA 1640 

• . • • 

1641 AGTCTC C GTG AAC ATGCCACTCCAG AAG ACT ATGGAG ATC 1680 

• t • • 
1681 GGCGAGAACTTGACATCCAGGACCTTCAGATACACCG ACT 1720 

• • • • 
1721 TCTCTAACCCTTTCAGTTTCCGTGCCAACCCTGACATCAT 1760 

17 61 TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 1800 

1801 TCATCTGGCGAATTGTACATTGACAAGATTGAGATCATTC 1840 

• • • • 
1841 TTGCCGACGCTACCTTCGAGGCTGAGTCTGACCTTGAGAG 1880 

• • • • 
1881 AGCCCAGAAGGCTGTGAACGCCCTCTTTACCTCCTCTAAT 1920 

• • • ♦ 
1921 CAGATTGGCTTGAAAACTG ACGTT ACTG ACTATCACATTG 1960 

• ♦ • • 
1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 2000 
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• • • • 
2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 2040 

• • • • 
2041 CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 2080 

• • • • 
2081 ACC CC AACTTCAG AGGCATCAAC AGGC AGCCAG ACCGTGG 2120 

« • • • 

5121 TT6GAGAGGA&GCACCGACATCACCATCCAAGGAGGCGAC 2160 

t • • * 

2161 GATGTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 2200 

• • • • 
2201 TGGACGAGTGCTACCCTACCTACTTGTACCAGAAGATCGA 2240 

• • • • 
2241 TGAGTCCAAACTCAAAGCCTACACCAGGTAXGAACTTAGA 2280 * 

, • • * • 

2281 GGCTACATCGAAGACAGCCAAGACCTTGAAATCTACCTCA 2320 

• • • • 

2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 2360 

30 2361 TACTGGTTCCCTCTGGCCACTTrCTGCCCAAATGCCCATT 2400 

t 

« • • • 

2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCGACACCrTG 244 0 

• • • * 
2441 AGTGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 2480 

• • • • 

2481 GAAGXGTGCCCACCATTCTCATCACTTCACCTTGGACATC 2520 

« • • • 

2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 25 60 

• • • • 
2561 GGGTCATCTTCAAGATCAAGACCCAAGACGGACACGCAAG 2600 

• • • * 

2601 ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 2640 

• • • • 

so 2641 GGTGAAGCTCTCGCTCGTGTGAAGAGAGCAGAGAAGAAGT 2680 

• ♦ • • 

2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 2720 

'2721 CGTTTACAAGGAGGCCAAAGAGTCCGTGGATGCTTTGTTC 2760 
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• • * « 

2761 GTGAACTCCCAATATGATAGGTTGCAAGTGGACACCAACA 2800 

• • • • 
2801 TCGCCATGATCCACGCTGCAGACAAACGTGTGCACAGGAT 2840 

• • • * 
2841 TCGTGAGGCTTACTTGCCTGAGTTGTCCGTGATCCCTGGT 2880 

2881 GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGTATCT 2 920 

• • • • 

, 5 2921 TTACCGCATACTCCTTGTACGATGCCAGAAACGTCATCAA 2960 

2961 GAACGGTGACTTCAACAATGGCCTCTTGTGCTGGAATGTG 3000 

3001 AAAGGTC ATGTGGACGTGG AGGAAC AGAACAATCACCGTT 3040 

< • • « 

3041 CCGTCCTGGTTATCCCTG AGTGGG AAGCTGAAGTGTCCCA 3080 

3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 3120 

• • • ♦ 
312 1 GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 3160 

3161 CCATCCACGAGATCGAGGACAACACCGACGAGCTTAAGTT 32 00 

• • • « 

35 3201 CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC 3240 

• • ♦ • 

3241 GTTACTTGCAACAACTACACTGGG ACCCAGG AAG AGTACG 3280 

• • • • 
3281 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 3320 

3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTATGCCTCC 33 60 

• • • • 
33 61 GTGTACGAGGAGAAATCCTACACAGATGGCAGACGTG AGA 3400 

• « • • 
3401 ACCCTTGCGAGTCCAACAGAGGTTACGGTGACTACACACC 3440 

• • * » 

3441 ACTTCCAGCAGGCTATGTT ACCAAGGACCTTGAGTACTTT 3480 

• • • • • 

55 3481 CCTGAGACCGACAAAGTGTGGA7CGAGATCGGTGAAACCG 3520 
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3521 AGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCTTGAT 3560 

5 

3551 GGAGGAA 3567. 

to j. einem Struktur-Gen, das fur ein insektizides P2-Protein codiert, mit der Sequenz: 
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1 ATGGACAACAACGTCTTGAACTCTGGTAGAACAACCATCT 40 

• • • • 
4 1 GCGACGCATAC AACGTCGTGGCTCACGATCCATTCAGCTT 8 0 

• • • • 
81 CGAACACAAGAGCCTCGACACTATTCAGAAGGAGTGGATG 120 

• • • • 

121 GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 1 60 

• * ♦ * 

161 TGGTTGGAAC AGTGTCCAGCTTCCTTCTCAAGAAGGTCGG 200 

• • • • 
201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 240 

• • • • 

* 241 ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA 280 

• • • • 
281 TCTTGAGGGAGACCGAACAGTTTCTCAACCAGCGTCTCAA 32 0 

4 • • • 

321 CACTGATACCTTGGCTAGAGTCAACGCTGAGTTGATCGGT 3 60 

• » • • 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 400 

• • • • 

45 401 ACTTCTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 440 

• • * • 

• 4 4 1 C ACTTCTTCCGTG AACACTATGCAGCAACTCTTCCTCAAC 480 

• • • • 
481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 520 

• • • * 
521 TTCTTCCACTCTTTGCTCAGGCTGCCAACATGCACTTGTC 5 60 

• • • • 
561 CTTCATACGTGACGTGATCCTCAACGCTGACGAATGGGGA 600 
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60 1 ATCTCTGCAGCCACTCTTAGGACATACAGAGACTACTTGA 



640 



64 1 GGAACTACACTCGTGATTACTCCAACTATTGCATCAACAC 



680 



681 TTATCAGACTGCCTTTCGTGGACTCAATACTAGGCTTCAC 720 



721 GACATGCTTGAGTTCAGGACCTACATGTTCCTTAACGTGT 



760 



761 TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACC AGAG 



800 



801 CTTGATGGTGTCCTCTGGAGCCAATCTCTACGCCTCTGGC 



840 



'841 
881 
921 
961 
1001 
1041 
1081 
1121 
1161 
1201 
1241 
1281 
1321 



AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 

GGCCATT CTTGT ATAGCTTGTTCC AAGTCAACTCCAACTA 920 

• • • * 
CATTCXCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 9 60 

• • • • 
TTTCCCAACATTGGTGGACTTCCAGGCTCCACTACAACCC 1000 

ATAGCCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 1040 

• * * * 

CAGCTCTGGATXGATTGGTGCAACTAACTTGAACCACAAC 1080 

• • • • 

TTCAATTGCTCCACCGTCTTGCCACCTCTGAGCACACCGT 1120 

• • * • 
TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 

AGTTGCTACCTCT ACAAACTGGCAAACCGAGTCCTTCCAA 12 0 0 

• • • • 
ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 1240 

m • • • 

GGAATTCAAACTACTTTCCAGACTACTTCATTAGGAACAT 128 0 

• • • • 
CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

• • • • 

CGTCCACTTCATTACAACC AGATTAGGAACATCGAGTCTC 13 60 
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• * • • 

13 61 CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

• • ■ • 

1401 TGTCCATAACAGGAAGAACAACATCTACGCTGCCAACGAG 1440 

• • « • 

1441 AATGGCACCATGATTCACCTTGCACCAGAAGATTACACTG 1480 

• • • 

1481 GATTCACCATCTCTCCAATCCATGCTACCCAAGTGAACAA 1520 

• • • • 

1521 TCAGACACGCACCTTCATCTCCGAAAAGTTCGGAAATCAA IS 60 

1561 GGTGACTCCTTGAGGTTCGAGCAATCCAACACTACCGCTA 1600 

• • • 

1601 GGTACACTTTGAGAGGCAATGGAAACAGCTACAACCTTTA 1640 

• • • 

1641 CTTGAGAGTTAGCTCCATTGGTAACTCCACCATCCGTGTT 1680 

1681 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 1720 

• ♦ • 

1721 CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG 17 60 

1761 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 

• • • • 

1841 CTGGAACTCCATTTGATCTCATGAACATCATGTTTGTGCC 1880 

1881 AACTAACCTCCCTCCATTGTAC 1902 

Oder 

K. einerStruktur-Gen-Sequenz, diefurein Fusionsprotein codiert, das die N-terminalen 610 AminosSuren von 
B.tk. HD-1 und die C-terminalen 567 Aminosauren von B.t.k. HD-73 aufweist, welches Gen die Seguenz hat: 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 
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41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • * • 

121 TCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 1 60 

1 61 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• * • . 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 
♦ 

32 1 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 3 60 

• • • « 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • • « 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • • 

"441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

♦ * * 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • * • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 5 60 

# * • • 

5 61 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

601 GGAA^CTACACCGACCACGCTGTTCGTTGGTACAACACTG 64 0 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCT AGAGATTGG AT 680 

681 T AGATAC AACC AGTTC AGG AGAGAATTG ACCCTCACAGTT 720 

721 TTGGACAXTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 760 

7 61 CC TACCCTATCCGTAC AGTGTCC CAACTTACC AGAG AAAT 800 
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• * 

801 CTATACTAACCCAGTTwTTGAGAACTTCGACGGTAGCTTC 840 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 

881 CACACTTGATGGACATCTTGAACAGCATAACTATCTACAC 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA6A 1080 

• • • • 

108 1 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAilTATCG 1120 

• • • • 

1121 GTATCAACAACCAGCAACTXTCCGTTCTTGACGGAACAGA . 1160 

11 61 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• * • • 

1241 C ACCACAGAACAACAATGTGCC ACCCAGGCAAGGATTCTC 12 8 0 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • • • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 13 60 

* • • 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • • • 

1401 TTCCTCTCAAATCACCCAAATCCCATTGACC AAGTCTACT 1440 

« • • • 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

• • * 

1481 TCACAGGAGGTGATArTCTTAGAAGAACTTCTCCTGGCCA " 1520 
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1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 15 60 
. • • • 

1561 CAAAGATATCGXGTCAGGATTCGTTACGCATCTACCACTA 1600 
• • • • 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 



1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

■ • • • 

1681 TTGCAATCCGGCAGCTTCAGAACCGTCGGTTTCACTACTC 1720 

• • * • 
1721 CTTTCAACTTCTCTAACGGATCAAGCGTTTTCACCCTTAG 1760 

• « • • 
1761 CGCTCATGTGTTCAATTCTGGCAATGAAGTGTACATTGAC 1800 

• m • • 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 1840 

4 • • • 

1841 AGTACAACCTTGAGAG AGC CCAGAAGGCTGTGAACGCCCT 1880 

• • * * 
1881 CTTTACCTCCACCAATCAGCTTGGCTTGAAAACTAACGTT 192 0 

• * • • • 

1921 ACTGACTATCACATTGACCAAGTGTCCAACTTGGTCACCT 1960 

• • • • 

35 1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT 2000 

2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

• • ■ « 
2041 AGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCAACA 2080 

2081 GGCAGCCAGAACGTGGTTGGGGTGG AAGCACCGGGATCAC 212 0 

• # • ♦ 
2 12 1 CATCCAAGGAGGCGACGATGTGTTCAAGGAGAACTACGTC 2160 

• • ■ • 
2161 ACCCTCTCCGGAACTTTCGACGAGTGCTACCCTACCTACT 220 0 

• • • • 

2201 TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTXCAC 224 0 
• • • * 

55 2"2 4 1 CAGGTATCAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 
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• * • • 
2281 CTTGAAATCTACTCGATCAGGTACAATGCCAAGCACGAGA 2320 

• # * • 
2321 CCGTGAATGTCCCAGGTACTGGTTCCCTCTGGCC&CTTTC 2360 

• • » • 
2361 TGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAACAGA 2400 

• • • * 

2401 TGCGCTCCACACCTTGAGTGGAATCCTGACTTGGACTGCT 2440 

• » • « 

75 2441 CCTGCAGGGATGGCGAGAAGTGTGCCCACCATTCTCATCA 2480 

• • • • 

2481 CTTCTCCTTGGACATCGATGTGGGATGTACTGACCTGAAT 2S20 

• * • • 
2521 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGACCC 2560 

• « • • 
2561 AAGACGGACACGCAAGACTTGGCAACCrTGAGTTTCTCGA 2600 

• ♦ * • 
2601 AGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTGAAG 2640 

• * ■ • 
2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2680 

• • • • 

2681 AATGGGAAACTAACATCGTTTACAAGGAGGCCAAAGAGTC 2720 

m ♦ t • 

35 2721 CGTGGATGCTTTGTTCGTGAACTCCCAATATGATCAGTTG 2760 

• • • « 

27 61 CAAGCCGACACCAACATCGCCATGATCCACGCCGCAGACA 2800 

• • • • 
2801 AACGTGTGCACAGC ATTCGTGAGGCTTACTTGCCTGAGTT 2840 
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2841 GTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAGGAA 2880 

• • • • 

2881 CTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 2920- 

• • • • 

2921 CCAGAAACGTCATCAAGAACGGTGACTTCAACAATGGCCT 2960 

• • « • 

2961 CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 3000 

• * • • 

3001 CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 3040 
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• • • • 

3041 AAGCTGAAGTGTCCCAAGAGGTTAGAGTCTGTCOK3GTAG 3080 

• • • • 

3081 AGGCTACATTCTCCGTGTGACCGCTTACAAGGAGGGATAC 3120 

• i • • • 

3121 GGTGAGGGTTGCGTGACCATCCACGAGATCG AGAACAACA 3160 

• • • • 

3161 CCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGAAA.T 3200 

• • • • 

3201 CTATCCCAACAACACCGTTACTTGCAACGACTACACTGTG 3240 

« • • • 

3241 AATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGTAACA 3280 

3281 GAGGTTACAACGAAGCTCCTTCCGTTCCTGCTGACTATGC 3320 
« • • • 

3321 CTCCGTGTACGAGGAGAAATCCTACACAGATGGCAGACGT 33 60 
t • • • 

3361 GAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACTACA 3400 

3401 CACCACTTCCAGTTGGCTATGTTACCAAGGAGCTTGAGTA 3440 

• • • • 

3441 CTTTCCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAA 3480 

• • • • 

3481 ACCGAGGGAACCTTCATCGTGGACAGCGTGGAGCTTCTCT 3520 

3521 TGATGGAGGAA 3531. 



Revendlcatlons 

1. Proc£d§ de modification d'une sequence de gene de structure du type sauvage qui code une proteine insecticide 
de Bacillus thuringiensis afin d'activer ['expression de ladite proline chez des planles qui comprend : 

a) Identification de regions & I'inteYieur de ladite sequence comprenant plus de quatre nucleotides cons£cutifs 
d'adenine ou de thymine, 

b) la modification des regions de I'&ape a) qui component deux ou plusieurs signaux de polyad6nylation & 
I'interieur d'une sequence de dix bases afin d'eliminer lesdits signaux tout en conservant une sequence de 
gene qui code ladite proteine, et 

c) la modification des regions de 15 k 30 bases entourant les regions de 1'6tape a) afin d'eMiminer ies signaux 
majeurs de poiyadenylation de plantes, les sequences consecutives contenant plus d'un signal mineur de 
polyad6nylation t les s6quenc s cons§cutives contenant plus d'une sequence ATTTA tout en conservant une 
sequenc de gene qui code ladite proline. 
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Procede d modification d'un sequence de gene de structure du type sauvage qui code une proline insecticide 
de Bacillus thuringiensis afin d'activer Texpression de ladite proteine chez des plantes qui comprend : 

a) Termination des signaux de polyadenylation contenus dans ledit gene de type sauvage tout en conservant 
une sequence qui code ladite proline, et 

b) 1'elimination des sequences ATTTA contenues dans ledit gene de typ sauvage tout n cons rvant une 
sequence qui code ladite proline. 

Procede selon la revendication 2, comprenant en outre Telimination des sequences autocompiementaires et le 
remplacement de tetles sequences par de. I'ADN non autocompiementaire comprenant des codons pr§f6res des 
plantes tout en conservant une sequence de gene de structure codant ladite proteine. 

Procede selon les revendications 1 k 3, comprenant en outre I'utilisation des sequences preferees des plantes au 
cours de Telimination des signaux de polyadenylation et des sequences ATTTA. 

Procede selon les revendications 1 k 3, dans lequet les signaux de polyadenylation des plantes sont choisis parmi 
le groupe constitue de AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, 
ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA et CATAAA. 

Procede destine k ameiiorer ('expression d'un gene heteroiogue chez des plantes dans lequel ledit gene comprend 
un gene chimere modifie comprenant un promoteur qui agit dans les cellules vegetales Ii6es de facon fonctionneile 
k une sequence de structure codante et k une region 3' non traduite contenant un signal de polyadenylation qui 
agit chez des plantes pour provoquer Taddition de nucleotides de polyadenylate sur Textremite 3' de TARN, dans 
lequel ladite sequence de structure codante code une proline insecticide dont une partie au moins est derived 
d'une proline de Bacillus thuringiensis, dans lequel ledit precede comprend la modification de ladite sequence 
de structure codante de sorte que ladite sequence comporte une sequence d'ADN qui differe de la sequence 
d'ADN apparaissant dans la nature codant ladite proteine de Bacillus thuringiensis et ladite sequence de structure 
codante ne contient pas plus de 5 nucleotides consecutifs constitues de restes soit adenine, soit thymine. 

/. Procede ^amelioration de Texpression d'un gene heteroiogue chez des plantes dans lequel ledit gene comprend 
un gene chimere modifie comprenant un promoteur qui agit dans des cellules vegetales Ii6es de f agon fonctionneile 
k une sequence de structure codante et k une region 3' non traduite contenant un signal de polyadenylation qui 
agit chez des plantes pour provoquer Taddition de nucleotides de polyadenylate sur Textremite 3' de TARN, dans 
lequel ladite sequence de structure codante code une proteine insecticide dont au moins une partie est deriv6e 
d'une proteine de Bacillus thuringiensis, dans lequel ledit procede comprend la modification de ladite sequence 
de structure codante de sorte que ladite sequence comporte une sequence d'ADN qui differe de la sequence 
d'ADN qui apparait dans la nature codant ladite proteine de Bacillus thuringiensis et presente les caracteristiques 
suivantes : 

ladite sequence de structure codante comporte une region qui est compiementaire de la sequence suivante : 



TCCTTGATTCCTAGCGAACT 

1 5 10 15 20 25 30 35 40 45 

ladite region dans ladite sequence codante ayant 6limin6 2 sequences AACCAA et 1 sequence AATTAA. 

8. Procede selon la revendication 7, dans lequel ladite sequence de structure codante code une proteine insecticide 
dont au moins une partie est d6rivee de Bacillus thuringiensis kurstakis HD-1 . 

9. Proc6d6 selon la revendication 7 ou 8, dans lequel la plante est un plan de tabac. 

10. Gfene chimere modifie contenant un promoteur qu! agit dans des cellules v6getales li6es de facon fonctionneile k 
un sequ ncedestructur codante t k une region 3' non traduite cont nant un signal de polyadenylation qui agit 
chez des plantes pour provoquer Taddition d nucl6otid s de polyadenylate sur Textremite 3' de TARN , dans lequel 
ladite sequence de structure codante code une proteine insecticid dont au moins une parti est d6riv6e d'une 
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proteine de Bacillus thuringiensis, dans lequel ladite sequence de structure codant comport une sequence 
d'ADN qui differe de la sequence d'ADN apparaissant dans la nature codant ladite proteine de Bacillus thuringiensis 
et est choisie a partir de : 

5 A. Un gene de structure qui code une proteine insecticide de B.t.k. HD-1 comportant la sequence : 
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■ • • • 

1 ATGGCTATAGAAACTGGTTACACCCCaATCGATArrTCCT 

• • • • 
4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 

81 TGCTGGATTTGTGTTAGGACTAGTTGATATTATCTGGGGA 

■ • • • 

121 ATTTTTGCTCCCTCTCAATGGGACGCATTTCTTGTACAAA 

• • • • 
161 TTGAACAGCTCATCAACCAGAGAATCGAAGAGTTCGCTAG 

• . • • • 

201 GAATCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 

• • • • 

* * 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 

• • • • 
281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 
. * • • • 
321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 

• * • • 
361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTCCTCTCCG 

» • « • 

401 TGTACGTTCAAGCTGCCAACCTCCACCTCTCAGTTTTGAG 

• • • • 
441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 

« • • « 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 

• • • • 
'521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 

• • • * 

561 GGGATTAGAGCGTGTATGGGGACCGG ATTCTAGAGATTGG 

• • • • 
601 ATCAGGTACAACCAGTTCAGAAGAGAGCTTACACTAACTG 

• - • • • 

64 1 TATTAGATATCGTTTCTCTATTTC CGAACTATGATAGTAG 

• • • • 

681 AAC GT AT C CAATT CG AACAGTTTC CCAATTAACAAG AGAA 
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t • • • 

721 ATTTATACAAACCCAGTATTAGAAAAXTTTGATGGTAG7T 760 

• * • • 

761 TTCGAGGCTCGGCTCAGGGCAIAGAAGGAAGTATTAGGAG 800 

» 

• • • • 

801 TCCACATTTGATGCATATACrraArAGTATAACCATCTAT 840 

• • • » • 

841 ACGGAXGCTCATAGAGGAGAArACTACTGGTCCGGTCACC 880 

• • * * - 

881 AGAICATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAArr 920 

• • • • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

• • • • 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAACAT 1040 

• « • • 

1041 CGGGATCAACAACCAACAACTATCTGTTCTTGACGGGACA 1080 

• • • • 

10 8 1 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
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• « • • 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• • • ♦ 

1161 ACCGCCAOVGAATAACAACGIGCCACCraGGCaAGGaTTT 1200 

< • • • 

1201 AGTCATCGAOTAAGCCATGTTTCAAXGTTTCGTTCAGGCT 1240 

• • • • 

1241 TTAGTAATAGTAGTGTAAGTATAAIAAGAGCTCCTATGTT 1280 

• • • • 

1281 CTCTTGGATACATCGTAGTGCTGAGTTCAACAACATCATC 1320 

• 9 « • 

1321 CCTTCATCACAAATCACCCAAATCCCACTCACCAAGTCTA 1360 

• • * • 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

• « • • 

1401 ATTTACAGGAGGAGATJOTCTTCGAAGAACTTCACCTGGC 1440 

• • • • 

1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 

• • • • 

1481 CACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

• • • • 

1521 AAACCTTCAGTTC CACACATCAATTGACGG AAGACCTATT 1560 

• • ■ » * 

1561 AATCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGTA 1600 

• • • * 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1 64 0 

• • • * 

1641 TCCGTTTAACTTTTCAAATG6ATCAAGTGTATTTACGTTA 1680 

• • . « 

1681 AGTGCTCATGTCTTCAATTCAGGCAATGAAGTTTATATAG 1720 

« • 

1721 ATCGAATTGAATTTGTTCCGGCA 1743. 

/ 

Un gen d structur qui code une protein insecticide de B.t.k. HD-73 comportant la sequence 
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• « • • 

1 ATGGCCATTGAAACCGGTTACACTCCCATCGACATCTCCT 40 

« • • • 

41 TGTCCTTGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGG 80 

• • • • 

81 TGCTGGGTTCGTTCTCGGACTAGTTGACATCaTCTGGGGT 120 

• • • • 

12 1 ATCTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAA 1 60 

• • • • 

161 TTGAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAG 200 

• • • • 

201 GAACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTC 240 

• • ♦ * 

241 TACCAAATCTATGCAGAGAGC7TCAGAGAGTGGGAAGCCG 280 

• * • • 

281 ATCCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCA 320 

• * * • 

321 ATTCAACGACATGAACAGCGCCTTGACCACAGCTATCCCA 360 

361 TTGTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCG 400 

• * ♦ 

401 TGTACGTTCAAGC AGCTAATCTTCACCTCAGCGTGCTTCG 440 
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• • • » 
441 AGACGTTAGCGTGTTTGGGCAAAGGTGGGGA.TTCGATGCT 

• • • , • 

481 GCAACCATCAATAGCCGTTACAACGACCTTACT^GGCTGA 

• • • « 
521 TTGGAJLACTACACCGACCACGCTGTrCGTTGGTACaAC3lC 

• • • • 

561 TGGCTTGGAGCGT GTCTGGGGTCCTGATTCTAGAGATTGG 

• • • • 

601 ATTAGATACAACCAGTTCAGGAGAGAAITGACCCTCA^ 

• • • * 
64 1 TTTTGGACATTGTGTCXCTeXTCCCG»ACIM(^^ 

• » • • * 

681 AACCrTACCCTATCCGll^CAGTGTCCC^CTTACCAGAGAA 

• • • • 
721 ATCTAIACTAACCCAGITCTl^GAACTTCGACGGTAGCT 

• • • • 
761 TCCGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAG 

• • • • 

801 CCCACACTTGAXGGACATCTTGAACAGCAXAACTATCTM 

• • * • 

841 ACCGATGCTCACAGAGGAG&GTATTACTGGTCTGGACACC 

• • • • 

881 AGATCAXGGCCTCTCCAGTTGGAXTCAGCGGGCCCGAGTT 

92 1 TACCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCA 
961 CAACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA 



480 
520 
560 
600 
640 
680 
720 
760 
800 
840 
880 
920 
960 
1000 



1001 GAACCTTGTCTTCCACCXTGTACAGAAGACCCTTCAATAT 1040 



so 



ss 
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» • « • 

104 1 CGGTATCAACAACCAGCAACTTTCCGTTCTTCSiCGGAACA 1080 

• • * • 
1081 GaGTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTG 1120 

• • • • 

1121 TTTACAGAAAGAGCGG&ACCGTTGATTCCTTGGACGAAAT 1160 

« • * • 

1161 CCCACCACAGAACAACAATGTGCCACCCAGGCAAGGATTC 1200 

• • • • 

1201 TCCCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGAT 1240 

« • • • 

so 1241 7CAGCAACAG7TCCGTGAGCA7CAICAGAGCTCCTATGTT 1280 

t 

• * ♦ • 

1281 CTCTTGGATACACCGTAGTGCTGAGTTCAACAACATCATC 1320 

J 

• • • * 

1321 GCATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 13 60 

• • • • . 
1361 ACTTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATT 1400 

• • • • 

1401 CACTGGTGGAGACCTCGTTAGACTCA^CAGCAGTGGAAAT 1440 

• • • * 

35 1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 

• • • • 

1481 TCCCATCCACATCTACCAGAIATAGAGTTCGTGTGAGGTA 1520 

• • * • 
1521 TGCTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGT 15 60 

• « • • 
1561 AATTCATCCATCTTCTCCAATACAGTrCCAGCTACAGCTA 1600 

• • • • 

1601 CCTCCTTGGATAATCTCCAATCCAGCGATTTCGGTTACTT 1640 

50 
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1 64 1 T GAAAGTGC CAATGCTTTT ACAXCTTCACT C GGTAACATC 1 58 0 

1 68 1 GTGGGTGTTAGAAACTTTAG7GGGACTGCAGGAGTGATTA 1720 

1721 TCGACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGA 1760 

17 61 GGCTGAG 1767. 

i 

C. Un gene de structure codant une proline insecticide de B.t.k. HD-1 comportant la sequence : 

• • • * 

1 ATGGACAACAACCCAAACATCAACGAATGCMJTCCAIACA 40 

• • • • 

41 ACTGCTTGAGTAACCOU3AAGTTGAAGTACTTGGTGGAGA 80 

• • • * 

81 ACGCATTGAAACCGGTTACACTCCCAJCGACATCTCCTTG 120 

121 TCCTT GACACAGTTTCTGCTCAGC GAGTTCGTGCCAGGTG 160 

• • • * 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • « • 

201 C77TGGTCCATCTCAATGGGATGCATTCCTGG7GCAAAZT 240 

• • .* * 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• « • • 

281 ACC^^CCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 320 
321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 
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• • * • 

'361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTAXTCAAT 400 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • « • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • * • 

481 TACGTTCAAO^GCTAATCTTC^ 520 

• • • • 

521 ACGTTAGCGTGTTTGGGCAAAGGTGGG3AITCGATGCTGC 560 

• • • • 

561 AACCATCAATAGCCGTTACAACSACCTTACTAGGCTGATT 600 

• * « • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

t 

641 GCTTGGAGCGTGTCTGGGG-TCCTGAT2CTAGAGATTGGAT 680 



so 681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• • • • 

721 TTGoACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 50 

• * • • 

7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 800 

• • • • 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

8 41 CGTGGTTCTGCCCAAGuTATCGAAGGCTCCATCAGGAGCC 880 

• • • • 

45 881 CACACTTGATGGACATCTT GAACAGCATAACTATCTACAC 920 

• • • • 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 9 60 

50 
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« • • • 

' 961 ATCATGGCCTCTCCaGTTGGATTCAGCGGGCCCGaGTTTA 1000 

• • • • 

1001 CCTTTCCTCTCTATGGAACTATSGGAAACGCCGCTCCACA 1040 

■ • • • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • « • 

1081 ACCTTGTCTTCCACCTOGTACAGAAGACCCTTCAAIATCG 1120 
1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 
11 SI GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

■ * • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • • • 

1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • ■ * 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• . • • * 

1321 AGCAACAGTTCCCTGAGCATCATCAGAGCTCCTATGTTCT 1350 

• • • * 

13 61 CATGGATTCATCGTAGTGCTGAGTTCAACAATATCATTCC 1400 

• • • • 

1401 TTCCTCTCAAAXCACCCAAATCCCATTGACCAAGTCTACT 1440 

• • • • 

1441 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

. • • « 

1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCTGGCCA 152 0 

• • • * 

1521 GATTAGCACCCTCAGAGTTAACATCACTGCACCACTTTCT 1560 
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• • • • 

1561 CAAAGaTATCGTGTCAGGATTCGTTACGCATCTMCACTa 1 600 

• • • • 

1501 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

• • ■ • 

1641 TCAGSGTAACTTCTCCGCAACCAXGTCAAGCGGC&GCAAC 1680 

« * • • 

1681 TTGCAAffCCGGCAGCTTCAG«^CGra * 1720 

« * * • 

1721 CTTTO^CTTCTCTAACGGAXCAaGCGTTTTQlCCCTllAG 1760 

• • • • 

1761 CGCTCATGTGTTCAATTCTGGCAATGA^TGTACATTGA^ 1800 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCTTCGaGGCTG 1840 

i 

1841 AGTAC 1845. 

i 

D. Un gene de structure codant une proteine insecticide derivee de B.t.k. HD-73 comportant la sequence 

1 ATG^C^CAACCCA^CATC^CSAAIGCaTTCCATA(a 40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGSTGGAGA 80 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

121 TCCTTGACACAGTTTCTGCTCAGCGAGT3CGTGCCAGGTG 160 

161 CTGGGTTCGTTCTCGGACTAGTTGACAICATCTGGGGTAT 200 
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• • • • 
'201 CTTTGGTCCATCTCAATGGGAIGCATTCCTGGTGCAaATT 

• * • • 

241 GAGCAG7TGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

• • • • 

281 ACCAGGCCATCTCTAGGTTGGAAGGAXTGAGCAATCTCTA 

• • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

• • • • 
361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

• • • • 

401 TCAACGACATGAACAGCGCC7TTGACCACAGCTATCCCATT 

• • • * 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 

• • • • 

481 TACGTTCAAGCAGCTAA2CTTCACCTCAGCGTGCTTCGAG 

• * 

• • • • 
521 ACGTTAGCGTGTTTGGGCAAAGG7GGGGATTCGATGCTGC 

• • • * 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 

• • • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

• • * * 
641 • GCTTGGAGCGTGTCTGGGGTCCTGArrCTAGAGATTGGAT 

• • • •. 
681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 

• • • • 

721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 
7 61 CCTACCCTATCCGTACAGTGTCCCAACTTACCAGAGAAAT 
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801 CTAIACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 
841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 880 
881 CACACTT GATGGACA1CTT GAACAG C A!T AACTATCTACAC 920 



921 



CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 



961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • * 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• • • • 
& 1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACA6A 1080 

' . . . 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

• • • • 
1121 GTAXCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 1160 

• • • ♦ 
1161 GTTC GCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 

1201 TACAGAAAGAGCGGAACCGTTGATTCCITGGACGAAATCC 1240 

• • • • 

40 1241 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCX 13 60 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACaACATCATCGC 1400 
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• • • * 

1401 ATCCGATAGTATTACTCA^CCCTGCAGTGAAGGGAAAC 1440 

• • • • 

1441 TTTCTCTTGlACGGTTCTGTCATTTCAGGaCCAGGATTCA 1480 

• * • * 

1481 CTGGTGGAGACCTCGTTAGACTO^ACAGCAGTGGAAATAA 1520 

« « • • 

1521 CA^CAGAATAGAGGGT2kTATTGAAGTTCCAAITCACTTC IS 60 

« • « * • 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• # • • 

1 SO 1 CTTCTGTGACCCCTATTCACCTCAACCJrTaATTGGGGT&A 1640 

• • • « 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTAC21GCTACC 1680 

• • • ♦ 

1681 TCCTTGGATAATCTCOUirCG^GCGATTTCGGTTRCTTTC 1720 

• • * • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 
« * * • 

17 61 GGGTGTTASAAACXTTAGTGGGACTGCAGGAGTGATTATC 1800 
» • • • 

1801 GACAGATTCGAGTTCATTCCAGTTACTGCAACACTCGAGG 1840 

• • ♦ ♦ 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTAATGCG 1880 

• • • • 

1881 CTGTTTACGTCTACAAACCAGCTTGGACTCAAGACAAATG 1920 
1921 G 1921. 

E. Un gfcne de structure codant la proline insecticide en pleine longueur de B.t.k, HD-73 comportant 
sequence : 
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• • ♦ • 

1 ATGGACAACAACCCAAACATCAACGAATGCATTCCATACA 40 

41 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 
81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 

• • • • 

121 TCCTTGACACAGTTTCTGCTCA6CGAGTTCGTGCCAGGTG 150 

• • • • 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • • 

201 CTTTGGTCCATCTCAATGGGATG CATTCCTGGTGCAAATT 240 

• • * • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 280 

• • * * 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCEA 320 

• • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 3 60 

• • • • 

361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 

• • * • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCAIT 440 

• • • • 

441 GTTCGCAGTCCAGAACTACCAAGTTCC7CTCTTGTCCGTG 480 

• • • * 

481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG S20 

9 * • 9 

521 ACGT7AGCGTG7TTGGGCAAAGGTGGGGA77CGATGCTGC 560 

• • * « 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 
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» • • 

SOI GGAAACTACACCGACCACGCTGTTCGTTGGTACaACACTG 

• * * . • 

64 1 GCTTGGAGCCTGTCTGGGGTCCTC^^ 



68 1 TAGAI&CA^CAGTTCAGGAfiagaaiTGACCCTOlCaGTT 72 0 



721 TTC3GACATTGTGTCTCTCTTCCC6AACTATGACTCCA6AA 760 



761 CCTACCCTATCCGTACAGTGTCCCaACnaCCafiaGaAAT 800 



801 CTATACTAACCCAGXTCTTGaGAaCTTCGACGGTAGCTTC 

m • • a> 

841 CGTGGTTCTGCCCAAGGTATCGAAGGCTCCATCAGGAGCC 

• • ♦ •> 

881 CACACTTGATGGACATCTTGAACAGCAIAACXAXCTACAC 

• • * • 

921 CGATGCTCACAGAGGAGAGTATTaCTGGTCTGG&CaCCaG 

961 ATCATGGCCTCTCCAC3TTGGATTCAGCGGGCCCGAGTTTA 
* • « • 

1001 CCTTTCCTCTCTATGGAACTATGGGaAACGCCGCTCCaCA 

• • « • 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTSTCTACAGA 

• • % • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 

• • • • 

1121 GTATCAACAACCAGCAACTrTCCGTTCTTGACGGAACAGA 



640 
680 



840 

880 

920 

960 

1000 

1040 

1080 

1120 

1160 



1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCAXCCGCTGTT 1200 



so 
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• • • ♦ 

'1201 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 1240 

• • * ♦ 

1241 CACCaCAGAACAACA&TGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 

1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

• • ♦ • 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

i • • • 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACAACATCAXCGC 1400 

• • • • 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 

• * • • 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 1430 

« * • • • 

1481 C2GGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAATAA 1520 

• * • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • ♦ • 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTA.TG 1600 

• • * ♦ 

1 SOI CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 164 0 

• • • • 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 1S80 

• • • • 

1681 TCCTTGGATAATCTCCAATCCAGCGATTrCGGTTACTTTG 1720 

• • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCACTCGGTAACATCGT 1760 

• . • * 

17 61 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 1800 
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« ♦ • * 

1801 GaCAGATTCGAGTTCATTCCAffTTACTGCAACACTCGaGG 1840 

• • • • 

1841 CTSaaTATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880- 

• • • • 

1881 GCTGTTTACGTCTACAAACCAGCTCGGCCTCAAGACCaAT 1920 

• • * • 

1921 GTGACGGATTATCATATTGATCAAGTGTCCAACTTGGTGA 1960 



1961 CCTACCTCAGCGATGAGTTCTGTCTGGATGAAA&GCGAGA 
« • • • 

2001 AXTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAX 

» • • • 

2041 GAACGCAAXTTACTCCAAGATTCAAA!rTT 

• * • * 
2081 ATAG6CAACCAGAACGTGGGTGGGGCGGA&GTACAGGGAT 

« * « * 

2121 TACCATCCAGGGAGGTGACGACGTGTTCAAGGAGAACTAC 
« • • * 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 

• • • • 

2201 ACCTCTACCAGAAGATCGACGAGTCCAAGTTGAAAGCCTT 

• * • • 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAA6ATAGTCAA 

• • • • 

2281 GACCTCGAGAXCTACCTCATCCGCTACAATGCAAAACATG 



2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 



2361 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 



2000 
2040 
2080 
2120 
2160 
2200 
2240 
2280 
2320 
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• • • • 
2401 CGAITCCGCGCCACACCT^GAATGGAATCCTGACTTAGArr 2440 

• • • • 
2441 GTTCGTGT&GG6ATGGAGAAAAGTCTGCCC&IC&XTCGCA 2480 

m • • • 

2481 TCATTTCTCCTTAGACATTGATGTAGGATGT&CJUaACriA 2520 

• • • • 

2S21 A&TGAGGACCTAGCTCTATGGCTGA^ 25 60 

• # • • 

2561 CGCMGATGGGCACGCAAGACTAGGGi^CTAGJUSTTTCT 2600 

• • • • 

20 2601 CGAAGAGAAACCATTAGxAGGAGAAGCGCTAGCTCGTGTG 2640 

• ♦ • • 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAGAAGT 2680 

• • * • 

2 68 1 TGGAATGGGAGACCA&CaTCqTCTACaaAGaGGCaAAAGA 2720 
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30 2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 

« • * • 

27 SI TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 

• • • • 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• * • t * 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTCTACG 2920 
« « « • 

45 2921 ATGCCAGAAACGTCATCAAGAACGGTGACTTCAACAATGG 2960 

• . • * • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

50 
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3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAX 

♦ • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 



3081 TCGTGGCTATATCCTTCGTGTC&CAGCGTACAAGGAGGGA 3120 



3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 
« • • • 

3161 AIACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 

• * • * 

3201 AAXCTATCCAAATAACACGGTAACGTGTAATGAITATACT 



3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 



3281 ATCGAGGAIAXAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 



3321 TGCGTCAGTCTATGA&GAAAAATCGTATACAGATGGACGA 3360 



3361 AGAGAGAATCCT3GTGAATTTAACAGAGGGTAIAGGGATT 



3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 



3040 
3080 



3160 
3200 
3240 



3400 



3440 



3441 RXACTTCCCAGAAACCGAIAAGGTATGGATTGAGATTGGA 3480 



3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3S20 



3S21 TCCTTATGGAGGAA 3S34 , 



F. Un gene de structure codant une proteine insecticide en pleine longueur de B.t.k. HD-73 comportant la 
sequence : 
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1 ATGGACAACAACCCAAACATCAACGAAIGCAXTCCAIACA 40 



4 1 ACTGCTTGAGTAACCCAGAAGTTGAAGTACTTGGTGGAGA 80 



81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 120 



12 1 TCCTTGACACAGTTTCrGCTCAGCGAGTTCGTGCCaGGTG 160 



161 CTGGGTTCGTTCTCGGACTAGTTGACATCAXCTGGGGTAT 
201 CTTTGGTCCATCTCAATGKATGCATTCCTGGTGC&A&IT 

• • • « 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

• • * • 

281 ACCAGGCCATCTCTAGGTTGGAAGGAOTGAGCAA^CTCTA 

• • • • 

32 1 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 

• * • • 
361 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 

• • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 

, . •• • 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 



200 
240 
280 
320 
360 
400 
440 
480 



481 TACGTTCAAGCAGCTAATCTTCACCTCAGCGTGCTICGAG 520 



45 



SO 



521 ACGTTAGCGTGTTTGGGCaAAGGTGGGGATTCGATGCTGC 560 
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• • • * 
561 ^CCATCAATAGCCGTTACAACGACCn^ACTAGGCTGATT 600 

« « « * 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 640 

• • • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGAXICTAGAGArrGGAT 680 

• • * m 

681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• % * « 
721 TTGGACATTGTGTCTCTCTTCCCGAACTATGACTCCAGAA 7 SO 

• • • • 
761 CCTAC CCTAIC C GTACAGTGTC C CAACTT AC CAGAGAAAT 800 ' 

• * • « 
801 CTATACTAACCCAGTTCTTGAGAACTTCGACGGTAGCTTC 840 

« * • * 

841 CGTGGTTCTGCCCAAGGTAICGAAGGCTCCATCAGGAGCC 880 

• • • • 
881 CACACXTGATGGACATCTTGAACAGOVTAACTATCTACAC 920 

• • - - 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• • ■ « 

961 ATCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• • • • 

35 1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 

• * • * 

1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• • • • 
1081 ACCTTGTCrrCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

1121 GTATCAACAACCAGCAACTTTCCGriCTTGACGGAACAGA 1160 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

so 1201 XACAGAAAGAGCGGAACCCTTGATTCCTTGGACGAAATCC 1240 

* 

12 4 1 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 12 8 0 

128 1 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 132 0 
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• • • 
1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 

12 61 CTTGGATACACCCTAGTGCTGAGTTCAACAACATCATCGC 

1401 ATCCGATAGTATTACTCAAATCCCTGCAGTCAAGGGAAAC 

1441 TTTCTCTTCAACGGTTCTGTCATTTCAGGACCAGGATTCA 

• • • • 

1481 CTGGTGGAGACCTCGTTAGACTCAACAGCAGTGGAAAXAA 

1521 CATTCAGAATAGAGGGiCATATTGAAGTTCCAAITCACTTC 

15 61 CCAICCACATCTACCAGATATAGAGTTCGTGXGAGGTATG 

• • • 

1601 CTTCTGTGACCCCTA2TCACCTCAACGTTAATTGGGGTAA 

1641 TTCATCCATCTTCTCCAATACAGTTCCAGCTACAGCTACC 

1681 TCCTTGGATAATCTCCAATCCAGCGATITCGGTTACTrTG 

* • ♦ • 
1721 AAAGTGCCAATGCTrTTACATCTTCACTCGGTAACATCGT 

• • • • 
1761 GGGTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTATC 

* • • • 

1801 GACAGATTCGAGTICATTCCAGTTACTGCAACACTCGAGG 

« * * • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAAXGC 

♦ • • . t 
1881 GCTGTTIACGTCTACAAACC^CTAGGGCTAAAAACAAAI 

1921 GTAACGGATTATCAJAKGATCAAGTGXCCAA3Tr AGTTA 

• • • • 
19 51 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 

2001 ATTGTCCGAGAAAGTCAAACAXGCGAAGCGACTCAGTGAT 

• • • 
2041 GAACGCAATTTACTCCAAGATTCAAA!rorCAAAGACATTA 
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• • • • 
2081 ATAGGCAACCAGAACGTGGG7GGGGCGGAAGTACAGGGAT Z120 

• • • • 

2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGA2U^TTAC 2160 
2161 GTCACACTATCAGGTACCTTTGArGAGTGCTATCCAACAT 2200 

• « • • 

2201 ATTTGTATCAAAAAATCG&TGAATCAAAATTAAAAGCCTT ^ 2240 

• « • • 

*s 2241 TACCCGTTATCAAtTAAGAGGiSTATATCGAAGATAGTCaA 2280 

...» 
2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 

• • • • 
2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 23 60 

« « • • 

2361 TJCAGC CCAAAGTCCAATCGGAAAGT GTGGAG AGC CGAAT 2400 

• • • * 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 

• • • • 
2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

• • • * 
2481 TCATTTCTCCTTAGACATTGATGTAGGATGTACAGACTTA 252*0 

• • • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2560 

• • • • 
2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2600 

. . • • 

2 60 1 CGAAGAGAAACC ATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 640 

♦ 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2 58 0 

2681 TGGAAXGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 272 0 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

2761 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG • 2800 

55 2801 ?.TAAACGTGTTCATAGCArTCGAGAAGCTTAXCTGCCTGA 2840 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

• • • • 

'2921 A?GCGAGAAAXGTCATTAAAAATGGTGAXTTTaATAATGG 2960 

• * • • 
2961 CTTATCCTGCTGGAACGTGAAAGGGC^^ 3000 

« • • • 

3001 GAACAAAACAACCA&CGTTCGCT 3040 

• • • • 
3041 GGoAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGG<; 3080 

• • • * • 

3081 TCGTGGCTATATCCTTCGTGTCacaGCGTACaAGGAGGGa 3120 

* • • 

25 3121 TAIGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• • • • 

3151 AIACAGACGAACTGA^TTTAGCAACTGCGTAGAAGAGGA 3200 

« • • • 

3201 AATCTATCCAAATAACACGGTAACGTGTAATGJOT^IACT 3240 

• * • * 

3241 GTAAAXCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

• - • • 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

• « • • 

3321 TGCGTCAGTCTATGAAGAAAaATCGTATACAGfllGGACGA 3360 

• • * 
3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 

• » • • 
3401 ZC&CGC CACTAC CAGTT GGTTATG7GACAAAAGAATTAGA 3440 

• • • • 
3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

• • • • 

* 3 4 8 1 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 

55 3521 TCCTTATGGAGGAA 3534. 
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G. Un gene de structur codant une proline insecticide en pleine longueur de B.tk. HD-73 comportant 
sequence : 

• • • 

1 ATGGAGAACAACCCAAACATCAACGAATGCATICCATACA 40 

• • • • 

'41 ACTGCTT GAGTAAC CCAG AAGTTG AAGTACTT GGTGG AGA 80 

• • * ■ 

81 ACGCATTGAAACCG6TTACACTCCCAICGACAICTCCTTG 120 

« « • • 

121 TCC2TGACACAGTTTCTGCTCAGCGAGTTCGTGCCAGGTG 160 

• • • * 

161 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 200 

• • • • 

201 CTTTGGTCCATCTCAATGGGATGCATTCCTGGTGCAAATT 240 

...» 
241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA • 280 . 

• • • « 

281 ACCAGGCCA3CTCTAGGTTGGAAGGATTGAGCAATCTCTA 320- 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCGAT 360 
, • # • 

3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTATTCAAT 400 
, - • • • 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 440 

• • • ♦ 

441 GTTCGCAGTCCAGAACTACCAAGTTCCTCTCTTGTCCGTG 480 

• • • • 

481 TACG7TCAAGCAGCTAATCTTCACCTCAGCGTGCTTCGAG 520 

• • • • 

521 ACGTTAGCGTG77TGGSCAAAGG7GGGGA7TCGATGCTGC 560 

• • • • 

561 AACCATCAATAGCCGTTACAACGACCTTACTAGGCTGATT 600 

• • • • 

601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG* 640 

. • • • 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 650 



1D 



15 



20 



25 



30 



35 



40 



45 



SO 



EP 0 385 962 B1 

♦ • • • 

681 TAGaiACAACCAGTTG^GGAGAGAATTGACCCTCACAGTT 720 

• • • • 

721 TTGCLCArTCTGTCTCTCTTCCCGAACTaiSACTCaiGaA 7 SO 

« • * • 

761 CCTACCCTATCCGTACAGTGTCC 800 

. • • • 

801 CTATACTAACCCAGTTCTTGA^ 840 

. • • • 

841 CGTGGTTCTGCCCAAGGTATCGAAi^^ 880 

881 CACACTTGATGGACATCTTGAACAGCA^^ 920 

921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 9 SO 

961 ATCATGGCCTCTCCAGTTGGArTCAGCGGGCCCGAGTTTA 1000 

1001 CCTTTCCTCTCTATGGAACTAXGGGAAACGCCGCTCCACA 1040 



1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

« • • • 

1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 11.20 

• • • • 
1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 11 SO 

« • ■ • 

1161 GTTCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 

• • • • 

12 01 TACAGAAAGAGCGGAACCGTTGATTCCTrGGACGAAATCC 12 4 0 

• • • • 
1241 CACCACAGAACAACAATG7GCCACCCAGGCAAGGATTCTC 1280 

• • • * 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGACTC 1320 

• * • 
1321 AGCAACAGTtCCGTGAGCATCATCAGAGCTCCTAXGTTCT 1360 » 

13 61 CTTGGATACACCGTAGTGCTGAGTTCAACA*ICA2CATCGC 1400 
55 1401 ATCCGATAGTATTACTCAAATCCCTGCAGTGAXGGGAAAC 1440 
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• • • • 
1441 TTTCTCTTCAACSGTTCTGTCAITTCAGG&CCaGGAITCa . 1480 

1481 CTGG1GGAGACCT CGTTAG ACTCAACAGCAGTGGAAATAA 1520 

• ♦ • • 

1S21 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 1560 

• • • • 

1561 CCATCCACATCTACCAGATATAGAGTTCGTGTGAGGTATG 1600 

• « • • 

is 1601 CTTCTGTGACCCCTATTCACCTCAACGTTAATTGGGGTAA 1640 

• • • • 

i 64 1 TXCAXCCAXCTTCTCCAATACAGTTCCAGCTACAGCTACC 1 680 

• • * * 
1681 TCCTTGGAXJ^CTCCAATCCAGCGAITTCGGTTACTTTG 1720 

■ » • * 

1721 AAAGTGCCAATGCTTTTACAICTTC^ 1760 

1761 GGCTGTTAGAAACTTTAGTGGGACTGCAGGAGTGATTAIC 1800 

• » • • 
1801 GACAGATTCGAGTTCA2TCCAGTTACTGCAACACTCGAGG 1840 

• « • * 
1841 CTGAGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGC 1880 

« • » • 

1881 CCrCl'rr ACCTCCACCAAICAGCT7 GGCTTGAAAACrAAC 1920 

• • • • 

1921 GTTACTGACTATCACATTGACCAAGTGXCCAACTTGGTCA 1960 

• • * * 
1961 CCTACCrrAGCGAXGAGTTCTGCCTCGACGAGAAGCGTGA 2C00 

• • • • 
2001 ACTCTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGAC 2040 

• « • • 
2041 GAGAGGAATCTCTTGCAAGACTCCAACTTCAAAGACATCA 2080 

• • • * 
2081 ACAGGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGAI 2120 

• • • • 
2121- CACCArCCAAGGAGGCGACGATGTGTTCAAGGAGAACTAC 2160 

• • • * 
2161 GTCACCCTCTCCGGAACXTTCGACGAQTGCTACCCTACCT 2200 
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2201 ACTTGTACCAGAAGAXCGATGAGTCCAAACTCAAAGCCTT 2240 

'2241 CACCAGGTATCAACTTAGAGGCTACATCGAAGACAfiCC^ 2280 

22 8 1 GACCTTGAAATCTACTCGAXCAGGTACAATGCCAAGCACG 2320 
2321 AGACCGTQAATtrrCCCAGGTACTGGTTCCCTCTGGCCACT 23 60 

23 61 TTCTGCCCAATCTCCCATTGGGAAGTGTGGAGAGCCTAAC 2400 

• « • • 

2401 AGATGCGCTCCACACCTTGAGTGGAAXCCTGACTTGGACT 2440 

, * • • 

2441 GCTCCTGCAGGGATGGCGAGAAGTGTGCCCACCAITCTCA 2480 



2481 TCACTTCTCCTTGGACATCGAIGTGGGAXGIACTGACCTG 

> 

» * * * 

2521 AATGAGGACCTCGGAGTC7GGGTCA7CTTCAAGATCAAGA 

• • • * 
25 61 CCCAAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCT 

• • • • 
2601 CGAAGAGAAACCATTGGTCGGTGAAGCTCTCGCTCGTGTG 

• * ♦ • 
2641 AAGAGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAAC 



2681 TCGAATGGGA&ACTAACAXCGTTTACAAGGAGGCCAAAjGA 2720 



27 2 1 GTCCGTGGAl'GCTTTGTlCGTGAACXCCCAATAirGAXCAG 

* • « 

27 61 TTGCAAGCCGACACCAACATCGCCATGATCCACGCCGCAG 
« • * • 

2801 ACAAACGTGTGCACAGCATTCGTGAGGCTCACTTGCCTGA 

• • • • 
2841 GTTGTCCGTGATCCCTGGTGTGAACGCTGCCATCTTCGAG " 

• • « • 
2881 GaACTTGAGGGACGT&TCTTTACCGCarrCTC C TT G T A Cg 



2520 
2560 
2600 
2640 
2680 



2760 
2800 
2840 

2880 
2920 



2921 ATGCCAGAMCGTCATCAAGAACGGTGACTTCAACaa.rGG 2960 
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• • • • 

2961 CCTCAGCTGCTGGAATGTGAAAGGTCATGTGGACCTGGAG 3000 

• # • * • 

3001 G^CAGAAC2yLICAGCGTrCCGTCCTGGTTGTGCCTGAGT 3040 

• • * « 

3Q41 GGGAAGCTGAAGTGTCCCAAGiUOTTAGAGTCTGTCG^ 3080 

• • • * 

3081 TAGAGGCTACAXTCTCCCTGTGACCGCT^ 3120 

• « • • • 

3121 TACGGTGAGGGTTGCGTGACCATCCACGAGATCGS.GAACa. 3160 

« • • • 

3161 ACACCGACGAGCTTAAGTTCTCCAACTGCGTCGAGGAAGA 3200 
3201 AATCTATCCCAACAACACCGTIACTTGCAACGACTACACT 3240 

• • « • 

3241 GTG^ATCAGGAAGAGTACGGAGGTGCCTACACTAGCCGIA 3280 

« • • * • 

3281 ACAGAGGTTA£AACGAAGCTCCTTCCGTTCC7GCTGACTA 3320 

• • • ■ 

3321 TGCCTCCGTGTACGAGGAGftAATCCTACACAGATGGCAGA 3360 

• • • • 

3361 CGTGAGAACCCTTGCGAGTTCAACAGAGGTTACAGGGACT 3400 

• • • • 

3401 ACACAC CACTTCCAGTTGG CTATGTTAC C AAGG AG CTTGA 3440 

• • • • 

3441 GTACrrrCCTGAGACCGACAAAGTGTGGATCGAGATCGGT 3480 

• • • • 

3481 GAAACCGAGGGAACCTrCATCGTGGACAGCGTGGAGCTTC 3520 

* 

3S21 TCTTG&IGGAGGAA 3534. 

i 

H. Un gene de structure qui code une proteine insecticide de B.tt. Comportant La sequence : 
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• • « • 

1 ATGACTGCAGACAACAACACCGAAGCCCTCGACAGTTCTA 40 

• • • • 

41 CCACT&AGGATG77AXCCAGAAGGGTATCTCCGTTGTGGG SO 
81 AGACCTCTTGGGCGTGGflTGGAOTCCCTTCGGTGGAGCC 120 

« • • • 

121 CTCGTGAGCTTCTAIACAAACTTTCTCAACACCATTTGGC 160 

• • • • 

161 CAAGCGAGGACCCTTGGAAAGCATTCATGGAGCAAGTTGA 2O0 

• • • * 

201 AGCTCTTATGGATCAGAAGATTGCAGATTATGCC2UIGAAC 240 

• • • • 

241 AAGGCTTTGGCAGAACTCCAGGGCCrrCAGAACAATGTGG 280 

• • • • 

281 AGGACTACGTGAGTGCATTGTCCAGCTGGCAGAAGAACCC 320 
321 TGTTAGCTCCAGAAATCCTCACAGCCAAGGTAGGATCAGA 360 

• • • • 

361 GAGTTGTTCTCTCAAGCCGAATCCCACTTCAGAAATTCCA 400 

• • • • 

401 TGCCTAGCTTTGCTATCTCCGGTTACGAGGTTCTTTTCCT 440 

• • • • 

441 CACTACCTATGCTCAAGCTGCCAACACCCACTTisTTTCTC 480 

• • • • 

481 CTTAAGGACGCTCAAATCTATGGAGAAGAGTGGGGATACG 52 0 

521 AGAAAGAGGACAXTGCTGAGTTCTACAAGCG2CAACTEA& 560 

• • • • • 

5 61 GCTCACCCAAGAGTACACTGACCATTGCGTGAAATGGTAT 600 

• • • * 

601 AACGTTGGTCTCGATAAGCTCAGAGGCTCTTCCTACGAGT 640 

• • • 

641 C7TGGG7GAACTTCAACAGAZACAGGAGAGASAXGACCTT 680 
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■ • < • 

681 GACTGTGCTCGATCTTATCGCACTCTTTCCCTTGTACGAT 720 

721 GTGAGACTCTACC CAAAGGAAGTGA&AACTG&GCTTACCa. 760 

• * • * 

761 GAGACGTGCTCACTGACCCTATTGTCGGAGTCAACAACCT 800 

• • • • 

801 TAGGGGTTATGGAACTACCTTCAGCAATATCGAAAACTAC 840 

• • • • 

841 ATTAGGAAACCACATCTCTTCGACTAXCTTCACAGA&TTC 880 

881 AArrCCACACAAGGTTTCAACCAGGATACTATGGTAACGA 920 

• • • • 

921 CTCCTTCAACTATTGGTCCGGTAACTATGTTTCCACCAGA 960 

961 CCAAGCATTGGATCTAA7GACATCATCACATCTCCCTTCT 1000 



1001 ATGGTAACAAGTCCAGTGAACCTGTGCAGAACCTTGAGTT 1040 

• * • • 

50 1041 CAACG^GAGAAAGTCTATAGAGCCGTCGCAAACACCAAT 1080 

• • • 

1081 C-CGCTGTGTGGCCAXCCGCAGTTTACTCAGGCGTCACAA 1120 

■ * • . 

1121 AGGTGGAGTTTAGTCAGTATAACGATCAGACCGArGAGGC 1160 

1161 CAGCACC CAGACTTACGACTCCAAACGTAACGTTGGCGCA 1200 

• • ♦ . 
1201 GTCTCTTGGGATTCTATCGACCAATTGCCTCCAGAAACCA 1240 

• • • « 
1241 CAGACGAACCATTGGAGAAGGGCTACAGCCACGAACTI&A 1280 

12 8 1 CTATGTGATGTGCTTCTTGATGCAAGGTTCCAGAGGGACC 1320 

• • • • 
1321 ATTCCAGTGTTGACCTGGACACACAAGTCCGTGGACTTCT 13 60 

• • • ■ 

1361 TCAACATGAICGAIAGCAAGAAGAT CACTCAACTTCCCTT 1400 

• • • • 

55 1401 GGTGAAAGCCTACAAGC7GCAATCTGGTGCTTCCGTTGTC 1440 
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• • • • 

1441 GCAGGTCCCAGATTCACTGGAGG7GACATCATCCAGTGCA 1480 

• • • • . 

1481 caGAGAACGGCAGCGCAGCTACTATCTACGTGACACCTGA 1520 

4 • • • 

152 1 TGTGTCTTACTCTCAGAAGTACAGGGCACGTATTCATTAC 15 SO 

• * * " 

15 6i GCATCTACCAGCCAGATCACCTTCACACTCAGCTTGGATG 1600 

• • • • 

iSOl GAGCACCCTTCAACCAGTATTACTTTGACAAGACCATCAA 1640 

• * • • 

1541 CAAAGGTGACACTCTCACATACAATAGCTTCAACTTGGCA 1680 

• • « - • 

1681 AGTTTCAGCACACCATTTGAACTCTCAGGCAACAATCTrC 1720 

• • • • 

1721 AGATCGGCGTCACCGGTCTCAGCGCCGGAG&CAAAGTCTA 1760 

• • • 

17 61 CATCGACAAGATTGAGTTCATCCCAGTGAAC 1791 . 

/ 

I. Un gene de structure qui code une proteine insecticide de B.t. entomocidus comportant la sequence 
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• • # • 

1 ATGGAGGAGAACAACCAAAACCJ^GCanrirCA3aCAACr . 40 

• a • « 

41 GCTTGaGTAACCCAGAAGAGGTATTGCTTGATGGAGAACG 80 

• * • • 

81 CATTTCAACCQ3TAACTCTTCCATCGACATCTCCTTGTCC 120 

« • • • 

121 TTGGTCCAGTTTCTGGTCAGCAACTTCGTGCCAGGTGGTG 160 

• • • * 

151 GGTTCCTTGTCGGACTAATTGACTTCGTCTGGGGTATCGT 200 

• • • ♦ 

201 TGGTCC&TCTCA&TGGGATGCATTCCTGGTGCAAATTGAG 240 

• ♦ * • 

241 CAGTTGATCAACGAGAGGATCGCTGAGTTCGCCAGGAACG 280 

• • * • 

281 CTGCCATCGCTAACTrGGAAGGATTGGGCAATA^CTTC&A 320 

• • • • 

321 CATCTATGTGGAGGCCTTCAAAGAGTGGGAAGAGGACCCT 360 

361 AACAACCCAGAGACCCGCACTAGGGTGATCGACAGATTCA 400 • 

• • • • 

401 GAATCTTGGACGGCCTCTTGGAGAGAGATAarCCCATCCTT 440 

• * * • 

441 CAGAATCTCTGGCTTCGAAGrrCCTCTCTTGXCCGTGTAC 4 80 
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• « • * 
481 GCTCAAGCAGCTAATCTTCACCTCGCTATCCTTCGAGACA S20 

• • • • 
521 6TGTCATCTTTGGSGAAAG6TGGSGATTGACCACT&TCAA 560 

• • * * 

5 61 CGTCAAIGAGAATTACAACAGACT 600 

• • • • 

601 GAGTACGCCGACCACTGTGCTAACACCTACAACCGTGGCT 640 

• * • • 

,5 641 XGAACAATCTCCCTAAGTCTACTTATCAAGArrGGATTAC 680 

• ♦ * * 

681 CTACAACAGGTTGAGGAGAGACTTGACCCTCACAGTTTTG 720 

• • * • 
721 GACATTGCAGCTTTCTTCCCGAAC7ATGACAACAGGAGAT 760 

• * • • 

761 ACCC7ATCCAACCAGTGGGTCAAC7TACCAGAGAAGTCTA 800 
801 TACTGACCCACTTATCAACTTCAACCCTCAGTrGCAAAGT 840 

• « • • 

841 GTCGCCCAACTTCCCACATTCAACGTCATGGAGTCCAGCC 880 

« • • • 

881 GTATCAGGAACCCACACTTGTTTGACATCTTGAACAACCT 920 

• • • • 
''921 TACTATCTTCACCGAXTGGTTCAGCGTTGGGCGTAACTTC 960 

• « • • 
961 TATTGGGGTGGACACAGGGTCATCTCCTCTCTTATTGGAG 1000* 

• • * * 
1001 GTGGGAACATTAC CTC7CCT ATCTATGGACGTGAGGCAAA 1040 

1041 CCAGGAGCCACCACGTACTTTCACCTTCAACGGTCCAGTC 1080 

1081 TTCAGAACCTTGTCTAACCCTACCTTGAGATTGCTCCAGC 1120 

1121 AACCTTGGCCAGCTCCACCTTTCAACCTTAGAGGTGTTGA 1160 

• • - • 
1161 GGGCGTTGAGTTCTCTACTCCXACCAACTCCTTCACTTAC 1200 

• . • ** 

L201 AGAGGTAGAGGAACCGTTGATTCCTTGACCGAACTCCCAC 1240 
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• • * • 
1241 CAGAG^CAAXaGCGTCCCACCCaGGGaAGGCTACTCCCA 

• ♦ • • 

1281 CAGGTTGTGCCaCGCAACCTTCGTQCAGCGTTCCGSaACT 
1321 CCATTCCTCACTACAGGAGTTGTGTTCTCaTGGACTGSIC 

• • • • 

1361 GTAGTGCTACTCTCACTAATACCATTGATCCCGAGAGGAT 

• • * • 

1401 CAATCAAaTCCCATTGCTCAAGGGTTTCCGTGTGTGGGGA 

• • • * 
1441 GGAACTTCTGTCATCACAGGACCAGGCTTCACAGGAGGTG 

• • • • 
1481 ATATTCTTAGAAGAAACACTTTTGGCGACTTTGTGAGCCT 

• « • • 

152 1 CCAAGTTAACATCAACTCTCCAATTACTC^AAGAIATCGT 

• • * • 
1561 CTCAGGTTTCGTTACGCATCrrCCCGTGACGCTAGAGTCA 

• * • • * 
1 601 XCGTGCTCACCGGAGCAGCTrCIACCGGTGTCGGTGGACA 

• • « • 
1641 AGTCTCCGTGAACATGCCACTCCAGAAGACTATGGAGATC 

• « « • 
1681 GGCGAGAACTTGACATCCAGGACCTTCAGATACACCGACT 

• • • • 
1721 TCTCTAACCCTTTCAGTrrCCGTGCCAACCCTGACATCAT 

• • • • 
17 61 TGGCATTAGCGAACAACCTCTCTTTGGAGCTGGTAGCATC 

• • • • 
1801 TCATCXGGCGAATTGTACATTGACAAGATTGAGAZCAX7C 

• • • • 
1841 TTGCCGACGCTACCTTCGAGGCTGAGTCTGACCTTGAGAG 

• • • • 
1881 AGC CCAG AAGGCTGTGAAC GCCCTCTTTACCTC CTCT AAT 

• * ♦ • 
1321 CAGATTGGCTTGAAAACTGACGTTACTGACTATCACATTG 

• • * * • 
1961 ACCAAGTGTCCAACTTGGTCGACTGCCTTAGCGATGAGTT 
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• - • • 

2001 CTGCCTCGACGAGAAGCGTGAACTCTCCGAGAAAGTTAAA 2040 

• * • • 

2041 CACGCCAAGCGTCTCAGCGACGAGAGGAATCTCTTGCAAG 2080 

• • • • 

2081 ACCCCAACTTCAGAGGCATCAACAGGCAGCCAGACCGTGG 2120 

• • * • 

2121 TTGGAGAGGJIRGCACCGACATCACCATCCAAGGaGGCGAC 2160 

• • * * 

2151 GATCTGTTCAAGGAGAACTACGTCACCCTCCCAGGAACTG 2200 

• « • • 

2201 TGG^GAGTGCTACCCTJ^CTACTTG^ACCAGAAG2lICGA 2240 

« • * • 

2241 TGAGTCCAAACTCAAAGCCTACACCAGGTATGAACTTAGA 2280 - 

• « • • 

2281 C-GCTACATCGAAGACAGCCAAGACCrTGAAATCTACCTCA 2320 

• • • • 

2321 TCAGGTACAATGCCAAGCACGAGATCGTGAATGTCCCAGG 2360 

• * • • 

2361 TACTGGTTCCCTCTGGCGACTTTCTGCCCAAATGCCCAXT 2400 

• » • ■ 

2401 GGGAAGTGTGGAGAGCCTAACAGATGCGCTCCACACCTTG 2440 

• • • * 

2441 AGTGGAATCCTGACTTGGACTGCTCCTGCAGGGATGGCGA 2480 

• « • • 

2481 GAAGTGTGCCCACCAITCTCATCACTTCACCTTGGACATC 2S20 

■ • • • 

2521 GATGTGGGATGTACTGACCTGAATGAGGACCTCGGAGTCT 2560 

• • • • 

2561 GGGTCATCTTCAAGATCAAGACCCAAGAC GGACACGCAAG 2600 

• • • • 

2601 ACTTGGCAACCTTGAGTTTCTCGAAGAGAAACCATTGCTC 2640 

• * • • 

2641 GGTGAAGCTCTCGCTCGTGTGAAGAGAGCAGAGAAGAAGT 2680 

2681 GGAGGGACAAACGTGAGAAACTCCAACTCGAGACTAACAT 2720 

« • • 

2721 CGTTrACAAGGAGGCCAAAGAGTCCGTGGATGCTOrGTTC 27 60 
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2761 6TGAACTCCCAATATGATAGGTTGCAAS36GAGACCAACA 2800 

• • • • 

2801 ?CGCCAIGAICCACGC7GCAS&aU^CG7GTGCACAS€&T 

• * * • 

2841 TCGTGAGGCTTACTTGCCTGAGTTCTCCCTGATCCCTGGT 

» « • • 

2881 GTGAACGCTGCCATCTTCGAGGAACTTGAGGGACGIAl'CT 

• • • ■ 

2921 TTACCGCATACTCCTTGTACGAXGCCAGAAACGTC&XC&A 

• • • • 

2961 GAACGGTGACTTC2ACAATG6CCTCTT6TGCT6G21ATG7G 

3001 AAAGGTCATGTGGACGTGGAGGAACAGAACAAS^^ 

• • ■ • 
3041 CCGTCCTGGTTATCCCTGAGTGGGAAGCTGAAGTGTCCCA 

• * • • 
3081 AGAGGTTAGAGTCTGTCCAGGTAGAGGCTACATTCTCCGT 

• • * • 
3121 GTGACCGCTTACAAGGAGGGATACGGTGAGGGTTGCGTGA 

• • • • 
3161 CCATCCACGAGATCGAGGACAACACCGACGAGCTTAAGTT 

• • • • 
3201 CTCCAACTGCGTCGAGGAAGAAGTCTATCCCAACAACACC 

• • • • 

3241 GTTACTTGCAACAACTACACTGGGACCCAGGAAGAGTACG 3280 

• • • • 

3281 AAGGTACCTACACTAGCCGTAACCAAGGTTACGACGAAGC 3320 

• • • • 

'3321 TTACGGAAACAATCCTTCCGTTCCTGCTGACTAXGCCTCC 3360 

• « ♦ • 

3361 GTGTACGAGGAGAAATCCTACACAGATGGCAGACCTGAGA 3400 

• * * • 

3401 ACCCTTGCGAGTCC^CAGAGGTTACGCTGACTACACACC 3440 

• • ■ • 

3441 ACnCCAGCAGGCTAXGTTACCAAGGACCTTGAGTACTXT 3480 

• • • * • 

3481 CCTGAGACCGACAAAGTGTGGATCGAGATCGGTGAAACCG 3S20 
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3521 AGGGAAC CTTC^CGTGGACAGCCTGG AGCTTCTCTTGAT 3 5 60 

5 

3561 GGAGGAA 3557. 

I 

70 J. Un gene de structure qui code - une proline insecticide P2 comportant la sequence : 
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• • • • 

1 ATGGACAACAACGTCTTGAACTCTGGTAGAACA&CCATCT 40 

• « • • 

41 GCGACGCATAO^CCSCGTGSCTCACC^CCAITCAGCTT 80 

• • ♦ • 

8 1 CGA&CACAAGAGCCTCGACACTAXTCAGAAGGAGTGGATG 120 

• • • • 

121 GAATGGAAACGTACTGACCACTCTCTCTACGTCGCACCTG 1 SO 

• • • • 

1 61 TGGTTGGAACAGTGTCCAGCTTCCTTCTCAAGAAGGTCGG 200 

• • • • 

201 CTCTCTCATCGGAAAACGTATCTTGTCCGAACTCTGGGGT 240 

« • • • 

241 ATCATCTTTCCATCTGGGTCCACTAATCTCATGCAAGACA 280 

• • • • 

281 T CTTGAGGGAG AC CGAACAGTTTC TCAACCAGCGTC TCAA 320 

• • • • 

321 CACTGaXACCTTGGCTAGAGTCaACSCTGAGTTGATCGGT 360 

• • « • • 

361 CTCCAAGCAAACATTCGTGAGTTCAACCAGCAAGTGGACA 400 

401 ACT7CTTGAATCCAACTCAGAATCCTGTGCCTCTTTCCAT 440 

• « • « 

441 CACTTCTTCCGTGAACACTATGCAGCAACTCTTCCTCAAC 48*0 

• • • • 

481 AGATTGCCTCAGTTTCAGATTCAAGGCTACCAGTTGCTCC 520 

• * • * 

521 TTCrrCCACTCTTTGCTCAGGCTGCCAACATGC^CTTGTC 5 60 

• * • • 

561 CTrCAXACGTGACGTGATCCTCAACGCTGACGAATGGGGA 600 
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601 ATCrCTGCAGCCACTCTTAGGACATAC^ 640 

• » « • 

641 6GAACTACACTCGTGATTACTCGVACTATTGCATCAACAC 680 

• • • • 

681 TTATCAGACTGCCTTTCGTGGACTCAAXACTAGGCTTC^ 72 0 

. • • * • 

721 GACATGCTTGAGTTC&GGACCTACATGTTCCTTAACGTGT 760 

• • * • 

7 61 TTGAGTACGTCAGCATTTGGAGTCTCTTCAAGTACCAGAG 800 

• * • • 

801 CTTGATGGTG7CCTCTGGAGCCAATCTCTACGCCTCTGGC 84 0 

* • • • 

' 841 AGTGGACCACAGCAAACTCAGAGCTTCACAGCTCAGAACT 880 



881 GGCCATTCTTCTATAGCTTGTTCCAAGTCAACTCCAACTA 

♦ • • • 
921 CATTCTCAGTGGTATCTCTGGGACCAGACTCTCCATAACC 

o XCTCCCAAGIXTGGTGGACTTCCIAGGCTCCJLCTACAACCC 
» « * . • 

1001 ATAC3CCTTAACTCTGCCAGAGTGAACTACAGTGGAGGTGT 

• • • • 
1041 CAGCTCT GGATTGATTGGTGCAACTAACTTGAACCACAAC 

* • • • 
1081 TTCAATTGCTCCACCGTCT1GCCACCTCTGAGCACACCGT 



1121 TTGTGAGGTCCTGGCTTGACAGCGGTACTGATCGCGAAGG 1160 



1161 AGTTGCTACCTCTAGAAACTGGCAAACCGAGTCCTTCCAA 

» • • • 

120 1 ACCACTCTTAGCCTTCGGTGTGGAGCTTTCTCTGCACGTG 



1241 GGAATTCAAACTACTTTCCAGACTACTTCAJTTAGGAACAT 1280 

• • ♦ * 

1281 CTCTGGTGTTCCTCTCGTCATCAGGAATGAAGACCTCACC 1320 

1321 CGTCCACTTCATTACAACCAGATTAGGAACA'TCGAGTCTC 1360 
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13 61 CATCCGGTACTCCAGGAGGTGCAAGAGCTTACCTCGTGTC 1400 

• • • 

1401 TG7CCATAACAGGAAGAACAACATCTACGCTGCCAACGAG 1440 

1441 AATGGCACCATGAITCACCTTGCACCAGAAGATTACACTG 1480 

• • • • 

1481 GATTCACCATCTCTCCAAJCCATGCTACCCAAGTGAACAA 1520 

• « • 

1S21 TCAGACaCC^CCTTC^CrcCG2UUyiGTTCGG2^IC^ 1560 

. • • • 

1561 GGTGaCTCCTTGAGGTTCSAGC&ATCCaACACTACCGCTA 1600 

. • • • 

1601 GOTACaCTTTGAGAGGCaATGGAAACaGCTACaACCTTTA 1640 

• * • * 

1641 CTXGAGACTTAGCTCCATTGGTAACTCCACCATCCGTGTT 1630 

• * • • 

1681 ACCATCAACGGACGTGTTTACACAGTCTCTAATGTGAACA 1720 

• • • • 

1721 CTACAACGAACAATGATGGCGTTAACGACAACGGAGCCAG 1760 

17 61 ATTCAGCGACATCAACATTGGCAACATCGTGGCCTCTGAC 1800 

1801 AACACTAACGTTACTTTGGACATCAATGTGACCCTCAATT 1840 

1841 C-GGAACTCCATTTGATCTCATGAACATCATGTTXGTGCC 1880 

ou 

1881 AACTAACCTCCCTCCATTGTAC 1902 



K. Une sequence de gene de structure codant une proteine de fusion comprenant les acides amines 610 N- 
terminaux de Rf.fr. HD-1 et les acides amines 567 C-terminaux de B.tk. HD-73, ledit gene comportant ia 
sequence : 



175 



EP 0 385 962 B1 
1 AXGG2£AACAACCCAA^^ 

• • * • 

41 ACTGCTIGAGTAACCCAGA&GTrGAAGTACTTGGTGGAG^ 

• • • • 

81 ACGCATTGAAACCGGTTACACTCCCATCGACATCTCCTTG 

• • • • 

121 TCCTTGACACAGTTTCTGCTCAGCGaGTTCGTGCCAGGXG 

• • • • 
1 61 CTGGGTTCGTTCTCGGACTAGTTGACATCATCTGGGGTAT 

• • • * 

201 CTTTGGTCCATCTCAAXGGGAIGCAITCCTGGTGCAAATT 

• • • • 

241 GAGCAGTTGATCAACCAGAGGATCGAAGAGTTCGCCAGGA 

281 ACCAGGCCATCTCTAGGTTGGAAGGATTGAGCAATCTCTA 

« • • • 

321 CCAAATCTATGCAGAGAGCTTCAGAGAGTGGGAAGCCG&T 

• * * 
3 61 CCTACTAACCCAGCTCTCCGCGAGGAAATGCGTAITCAAT 

401 TCAACGACATGAACAGCGCCTTGACCACAGCTATCCCATT 
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'441 GTTCGCAGTCCaGAACTaCCAAGTTCCTCTCTTGTCCGTG 480 



481 TACGTTCAAGCAGCTAATCTTCACCTCaGCGTGCTTCGAG 520 



521 ACGTTAGCGTGTTTGGGCAAAGGTGGGGATTCGATGCTGC 560 



5 61 AACCATCAATAGCCGTTAC21&CGACCTTACTAGGCTGATT 

• • • « 
601 GGAAACTACACCGACCACGCTGTTCGTTGGTACAACACTG 

• • • * 

641 GCTTGGAGCGTGTCTGGGGTCCTGATTCTAGAGATTGGAT 



681 TAGATACAACCAGTTCAGGAGAGAATTGACCCTCACAGTT 720 

• * • • 

721 TTGGACATTGTGTCXCTCTTCCC.GAACTATGACTCCAGAA 760 

• • • • 

7 61 CCTACCCTATCCGTACAGTCTCCCAACTTACCAGAGAAAT 800 



600 
640 
680 



so 
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801 CTATACTAACCCAGTTJTTGAGAACTTCGACGGTAGCTTC 840 

• « • • 

841 CGTGGTTCTGCCCAAGGTAICGAAGGCTCCATCAGGAGCC 880 

• * • • 

881 CACACTTGaiGGACATCTTGAACAGCATAACTATCTACAC 920 
921 CGATGCTCACAGAGGAGAGTATTACTGGTCTGGACACCAG 960 

• * * • 

961 ^TCATGGCCTCTCCAGTTGGATTCAGCGGGCCCGAGTTTA 1000 

• * • • 

1001 CCTTTCCTCTCTATGGAACTATGGGAAACGCCGCTCCACA 1040 



1041 ACAACGTATCGTTGCTCAACTAGGTCAGGGTGTCTACAGA 1080 

• * * • 
1081 ACCTTGTCTTCCACCTTGTACAGAAGACCCTTCAATATCG 1120 

• • • • 

1121 GTATCAACAACCAGCAACTTTCCGTTCTTGACGGAACAGA 11 SO 

« * • • • 

11 61 G7TCGCCTATGGAACCTCTTCTAACTTGCCATCCGCTGTT 1200 
« • • • 

35 12 0 1 TACAGAAAGAGCGGAACCGTTGATTCCTTGGACGAAATCC 12 4 0 

• * • • 

124 1 CACCACAGAACAACAATGTGCCACCCAGGCAAGGATTCTC 1280 

• • • • 
1281 CCACAGGTTGAGCCACGTGTCCATGTTCCGTTCCGGATTC 1320 

1321 AGCAACAGTTCCGTGAGCATCATCAGAGCTCCTATGTTCT 1360 

• • « • 

13 61 CATGGATTCATCGTAGTGCTGAGTXCAACAAXATCATTCC 1400 

• • • • 
1401 TTCCTCTCAAAXCACCCAAATCCCATTGACCAAGTCTACT 1440 

• • • • 

14 4 1 AACCTTGGATCTGGAACTTCTGTCGTGAAAGGACCAGGCT 1480 

• • • • 

55 1481 TCACAGGAGGTGATATTCTTAGAAGAACTTCTCCIGGCCA ' 1520 
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1521 GATTAGCACCCTCAGafiTTAAOlICACTGCACCACTTTCT 1560 
IS 61 CAAAGATATCGTGTCAGGATTCGTTACGCATCTACCACTA 1600 

1601 ACTTGCAATTCCACACCTCCATCGACGGAAGGCCTATCAA 1640 

• • • • 

1641 TCAGGGTAACTTCTCCGCAACCATGTCAAGCGGCAGCAAC 1680 

< • • * 

1681 TTGCAATCCGGCAGCTTCAGaACCGTCGGTTXCACTACTC 1720 

1721 CTTTCAACTTCTCTAACGGATCAAGCGTTT^ 1760 

• « * • 

1761 CGCTCATGTGTTCAATTCTGGCAATGAilGTGTACATTGAC 1800 

• • • * 

1801 CGTATTGAGTTTGTGCCTGCCGAAGTTACCCTCGAGGCTG 1840 

• • • • 

1841 AGTACAACCTTGAGAGAGCCCAGAAGGCTGTGAACGCCCT 1880 

• * • • « 

1881 CT7Z&CCTCC&CC2UIICAGCTTG6C3TGAAAACT2LACSIT 1920 

• * • • • 

1921 ACTGAC7A7CACATTGACCAAGTGTCCAACTXGGTCACCT 1960 

• • • • 

1961 ACCTTAGCGATGAGTTCTGCCTCGACGAGAAGCGTGAACT. 2000 

• • • • • 

2001 CTCCGAGAAAGTTAAACACGCCAAGCGTCTCAGCGACGAG 2040 

2041 AGGAATC7CTTGCAAGACTCCAACTTCAAAGACATCAACA 2080 

• • * • 

2081 GGCAGCCAGAACGTGGTTGGGGTGGAAGCACCGGGATCAC 2120 

• m • * 

2121 CATCCAAGGAGGCGACGATGXGTTCAAGGAGAACTACGTC 2160 

• • * * 

2161 ACCCTCTCCGGAACTTXCGACGAGTGCTACCCTACCTACT 22 00 

• • • • 

2201 TGTACCAGAAGATCGATGAGTCCAAACTCAAAGCCTTCAC 2240 

• • • * 

2241 CAGGTAT CAACTTAGAGGCTACATCGAAGACAGCCAAGAC 2280 



179 



EP 0 385 962 B1 

- 

2281 CTTGAAATCIACTCGA^^ 2320 

• • • • 

2321 CCGTG&AXGTCCCAGCT&CTGGTTCCCTCTGGCC&CTTTC 2360 

• • « • 

2361 TGCCCAATCTCCCATTGGGAAGTGTGGAfiAGCCTaaCAGA 2400 

• • • • . ♦ 

2401 TGCGC7CCACACC7T6AGTSG&&TCCTG&CTTGGACT6CT 2440 

• • • • 

2441 CCTGCAGGGAIGGCGAGAAGTGTGCCCACCATTCTCATCA 2480 

• • * • 

2481 CTTCTCCTTGGACAXCGATGTGGGATGTACTGACCTGAAT 2520 

• • • • 

2S21 GAGGACCTCGGAGTCTGGGTCATCTTCAAGATCAAGaCCC 2560 

• * * • 

2561 AAGACGGACACGCAAGACTTGGCAACCTTGAGTTTCTCGA 2600 

• • • • 

2601 AGAGAAACCA7TGGTCGGTGAAGCTCTCGCTCG7GTGAAG 2640 

• » • « 

2641 AGAGCAGAGAAGAAGTGGAGGGACAAACGTGAGAAACTCG 2680 

• • • • 

2681 AaTGGGAAACTAACATCGTTTACAAGGAGGCCAAAGAGTC 2720 

• # ♦ • 

2721 CGTGGATGCrT'i'GTTCGTGAACTCCCAATATGATCAGTTG 2760 

• . • • * 

2761 CAAGCCGACACCA^CATCGCCATGATCCACGCCGCAGACA 2800 

• • • • 

2801 AACGTG7GCACAGCATTCGTGAGGCTTACTTGCCTGAGTT 2840 

2841 GTCCGTGATCCCTGCTGTGAACGCtTGCCATCTTCGAGGAA 2880 

• • • • 

2881 CTTGAGGGACGTATCTTTACCGCATTCTCCTTGTACGATG 2920 

• • • • 

2921 CCAGAAACGTCA7CAAGAACGG7GACT7CAACAATGGCCT 2960 

• * * * 

2961 CAGCTGCTGGAATGTGAAAGGTCATGTGGACGTGGAGGAA 3000 

• • • 

3001 CAGAACAATCAGCGTTCCGTCCTGGTTGTGCCTGAGTGGG 3040 
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• • • • 
3041 AAGCTGAAGTGTCCa^GAGGTTAGAGTCTGTCCaGGTAG 

• • • • 

3081 AGGCT&CATTCTCCGTGTGACCGC3T2C&&GGA6GG&TAC 

• i • • . 
3121 66TGAGS6TTGC6TGACCATCC&CG2t6ATCGAGAAC2ACA 

• • • • 

3161 CCGACGAGCTT2iAGTTCrCCaACTGCGTCGAGGaAGaAAT 

' * 

• • • • 

3201 CTATCCCAACAACaCCGTTACTTGCAACGACTACACTGTG 

• • • * 

3241 AATCAGGAAGAGTACGGAGGTGCCTACaCTAGCCGTAACA 

• • • • 

3281 GAGGTTACAACGAAGC7CCTTCC6T7CCTSCT6ACTATGC 

• • • * 
3321 CTCCGTGTACGAGGAGAAAICCTACACAGATGGCAGACGT 

• • • • 

3361 GAGAACCCTTGCGAGTTCaACAGaGGTTACAGGGACTACA 

• i • • • 

3401 CACC2^C7TCC&6TTGGCTATGTTACCAAGGAGCTT6AGTA 

♦ • • • 

3441 CTTTCCTG&G&CCGACAAACTGTGGATCGAGATCGGTSAA 

• • • • 

3481 ACCGaGGGAACCTTCaTCGTGGacaGCGTGGMCTTCTCT 

3521 TGATGG&GGaA 3531. 
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1 ATGGCTATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 
4 1 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 8 0 



8 1 TGCTGG ATTTGTGTTAGG ACTAGTTGATATAATATGGGGA 120 

T C 

• • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 160 



161 TTGAAC AGTTAATTAACC AAAGAATAGAAGAATTCGCTAG 200 
C C C G C G 

• • * * 

201 GAACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTT 240 
T 

• • • * 

241 TATCAAATTT ACGC AGAATCTTTTAG AGAGTGGGAAGC AG 280 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 

• • * * 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 3 60 

. » • • 

3 61 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 

CC C C 

401 TATATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAG 440 
G C C CC C CC C 

4 41 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 480 

• • • • 

481 GCGACTATCAATAGTCGTTATAATGATTTAACTAGGCTTA 520 

• • • * 

521 TTGGCAACTATACAGATCATGCTGTACGCTGGTACAATAC 5 60 

• • • * 

561 GGGATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGG 600 

• • * • 

601 ATAAGATATAATC AATTTAGAAGAGAATTAAC ACTAACTG 640 
CGCCGC GCT 

• • • • 

641 TATTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAG 680 

• • • • 

681 AACGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAA 720 

FIGURE 2A 
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• • • • 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 

• • • • 

7 61 TTCGAGGCTCGGCTCAGGGCATAGAAGGAAGTATTAGGAG 800 

• • • • 

801 TCCAC ATTTGATGGATATACTTAATAGTATAACC ATCTAT 840 

841 ACGGATGCTCATAGAGGAGAATATTATTGGTCAGGGCATC 880 

C C C T C 

• • • • 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C 

• • • • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 960 

961 CAACAACCiTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 

1001 GAACATTATCGTCCACCTTATATAGAAGACCTTTTAATAT 104 0 

. C 

• « • * 

1041 AGGGATAAATAATC AACAACTATCTGTTCTTGACGGGAC A 1080 

c c c c 

• • • • 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 

• • • • * 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 

• • • * 

XI 61 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 

• • • * 

1201 AGTC ATCG ATT AAGCC ATGTTTC AATGTTTCGTTCAGGCT 1240 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 

• • • • 

1281 CTCTTGGATACATCGTAGTGCTGAATTTAATAATATAATT 1320 

G C C C C C 

1321 CCTTCATCACAAATTACACAAATACCTTTAACAAAATCTA 1360 

C C C AC C C G 

1361 CTAATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGG 1400 

FIGURE 2B 
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1401 ATTTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGC 1440 

• • • • 

1441 CAGATTTCAACCTTAAGAGTAAATATTACTGCACCATTAT 1480 

• • • • 

1481 C ACAAAGATATCGGGTAAGAATTCGCTACGCTTCTACCAC 1520 

• • • • 

1521 AAATTTACAATTCCATACATCAATTGACGGAAGACCTATT 1560 
CC T G C 

• • • * 

1561 AATC AGGGG AATTTTTCAGC AACT ATGAGTAGTGGG AGTA 1600 

• • • • 

1601 ATTTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTAC 1640 

• • * • 

1641 TCCGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTA 1680 

• • • • 

1681 AGTGCTCATGTCTTC AATTCAGGC AATGAAGTTT AT AT AG 1720 

1721 ATCGAATTGAATTTGTTCCGGCA 1743 

FIGURE 2C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

• • • • • 
4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGT ATTAGGTGG AGA 8 0 

C C G A T C T 

« • • • 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CCC T 

• • • * 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

• • • • 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

• • • • 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

• • • t 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

• « • • 

361 CCTACTAATCCAGC ATT AAG AGAAGAGATGCGTATTC AAT 400 
C TC CC C G A 

• • • • 

401 TCAATGAC ATG AACAGTGCCCTTACAACCGCT ATTCCTCT 4 40 
C CTGCA CAT 

• • • • 

441 TTTTGCAGTTC AAAATT ATC AAGTTCCTCTTTTATC AGTA 480 
GC CGCC CGCG 

• • • * 

481 TATGTTCAAGCTGC AAATTT AC ATTTATC AGTTTTGAG AG 520 
C A T C T CC CAGC GC TC 

• • • • 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • • 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

• • • * 

60 1 GGCAACTATACAGATcATGCTGTaCGCTGGTACAATACGG 640 
A CCCCC TT ,CT 

641 GATTAGAGCGTGTATGGGGACCGGATTCTAGAGATTGGAT 680 
C G C T T 



FIGURE 3A 
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• « • ♦ 

681 AAGATATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
T CCGCG GCCAT 

. • • * 

721 TTAGATATCGTTTCTCTATTTCCGAACTATGATAGTAGAA 760 
G C T G C C CTCC 

• • • • 

7 61 CGTATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCTCT G CTC 

801 TT ATAC AAACCC AGTATT AG AAAATTTTG ATGGTAGTTTT 840 
C T TCTGCCC CC 

• • • • 

841 CG AGGCTCGGCTCAGGGC AT AGAAGG AAGTATT AGGAGTC 880 
T T T C A T C CTCC C C 

881 CACATTTGATGGATATACTTAATAGTATAACCATCTATAC 920 
C CCTGCC T C 

921 GGATGCTCATAGAGGAGAATATTATTGGTCAGGGC ATCAA 960 
C C GCTACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AGAATTC A 1000 
C C A T A CAGC C G T 

• • • * 

1001 CTTTTCCGCTATATGGAACTATGGG AAATGC AGCTCCAC A 1040 
CTC C C 

• • * * 

1041 AC AACGTATTGTTGCTCAACTAGGTC AGGGCGTGTATAGA 1080 
C T C C 

1081 ACATTATCGTCCACCTTATATAGAAGACCTfTTTAATATAG 1120 • 
CGT GC CC C 

1121 GGATAAATAATGAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTT ATGG AACCTCCTC AAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

• • • • 

1201 TACAGAAAAAGCGG AACGGTAGATTCGCTGGATGAAAT AC 1240 
G C T CT C " C 

1241 CGCCACAG AATAAC AACGTGCC ACCTAGGC AAGGATTT AG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGT AAGTAT AATAAG AGCTCCTATGTTCT 1360 
C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTT AATAATAT AATTCC 1400 
AT G C C C 

FIGURE 3B 
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1401 TTCATCACAAATTACACAAATACCTTTAACAAAATCTACT 1440 
CT CC CAGCG 

• • • • 

1441 AATCTTGGCTCTGGAACTTCTGTCGTTAAAGGACCAGGAT 1480 
C A G C 

• • • • 

1481 TTACAGGAGGAGATATTCTTCGAAGAACTTCACCTGGCCA 1520 
C T A T 

• • • • 

1521 GATTTCAACCTTAAGAGTAAATATTACTGCACCATTATCA 1560 
AGC CC TCC CTT 

1561 C AAAGAT ATCGGGTAAGAATTCGCT ACGCTTCT ACC ACAA 1600 

T C G T A A 

• • • * 

1601 ATTTACAATTCCATACATCAATTGACGGAAGACCTATTAA 1640 
CG* CCCC G C 

1641 TCAGGGGAATTTTTCAGCAACTATGAGTAGTGGGAGT AAT 1680 
T C C C C TCA CCCC 

• • • • 

1681 TTACAGTCCGGAAGCTTTAGGACTGTAGGTTTTACTACTC 1720 
GA C CACC C 

• • • • 

1721 CGTTTAACTTTTCAAATGGATCAAGTGTATTTACGTTAAG 17 60 
TC CTC CTCCCT 

• • • • 

1761 TGCTCATGTCTTC AATTCAGGCAATGAAGTTTATATAGAT 1800 
C G T G C T C 

• • • • 

1801 CGAATTGAATTTGTTCCGGCAGAAGTAACCTTTGAGGCAG 1840 
T G GTC T C T 

1841 AATAT 1845 
G C 



FIGURE 3C 
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1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 4 0 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 
C C G AT C T 

8 1 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGG ACTAGTTGAT ATAATATGGGGAAT 200 
GCTCC CCC T 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

• • t • 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

• • • • 

281 ACCAAGCC ATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 TC AAATTT ACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

« • ■ • 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

• • • • 

401 TCAATGACATGAACAGTGCCCTTACAACCGCTATTCCTCT 4 40 
C CTGCA CAT 

441 TTTTGCAGTTCAAAATTATC AAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

481 TATGTTCAAGCTGC AAATTT AC ATTTATCAGTTTTGAGAG 520 
C AT C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 

• • * • 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 



FIGURE 4A 
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• • • • 

681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGA^TTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

801 TTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

921 GG ATGCTC AT AGGGGTTATTATTATTGGTC AGGGC ATC AA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATTCA 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACT ATGGGAAATGC AGCTCCAC A 1040 
CTC C C 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1080 
C . T C C 

1081 ACATTATCGTCCACTTTATATAGAAGACCTTTTAATATAG 1120 
CGT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

1161 ATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 



FIGURE 4B 
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• • • • 

1401 ATCGGATAGTATTACTC AAATCCCTGCAGTG AAGGGAAAC 1440 
C 

« • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA 1480 

c c c c c 

• « • • 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

• • • • 

1521 CATTC AGAAT AG AGGGTATATTGAAGTTCC AATTC ACTTC 1560 



1561 CCATCGAC ATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

« • • • 

1501 CTTCTGTAACCCCGATTCACCTC AACGTTAATTGGGGT AA 1640 
G T 

• • • • 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

. • • • 

1681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

. • • • 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 1760 

C C C C 

1761 AGGTGTTAGAAATTTTAGTGGG ACTGCAGGAGTGATAATA 1800 
G C T C 

• * * * 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

• * • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC L880 

A TGCG 

• • • 

1881 GCTGTTT ACGTCT AC AAACC AACT AGGGCTAAAAAC AAAT 1920 
CTGT ACGTCTACA C AGCT G ACTC G CA TG 

1921 G 1921 
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1 GAAAGAATAGAAACTGGTTACACCCCAATCGATATTTCCT 4 0 
ATGGCC T C T C C C 

• • « • 

41 TGTCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGG 80 
CTGAG GCCCGCGA 

• • • • 

8 1 TGCTGGATTTGTGTTAGG ACTAGTTGATATAATATGGGGA 120 
GC TCC CCC T 

• • • • 

121 ATTTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAA 1 60 
C A T C G G 

• • • • 

161 TTGAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAG 200 
G G-C GGC G C 

201 G AACC AAGCC ATTTCTAGATT AGAAGGACT AAGC AATCTT 240 
G C G G T G C 

• • • • • 

241 TATCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAG 280 
C C T GAGC C C 

• • * • 

281 ATCCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCA 320 
C TC CC C G A 

• • ■ • 

321 ATTCAATGACATGAACAGTGCCCTTACAACCGCTATTCCT 360 
C CTGCA CA 

• • • • 

361 CTTTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAG 400 
TGC CGCC CGC 

401 T ATATGTTCAAGCTGC AAATTTAC ATTTATCAGTTTTGAG 440 
G C A T C T CC CAGC GC TC 

441 AGATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCC 4 80 
.C AGC G C T 

• • • * 

481 GCGACTATCAAT AGTCGTTATAATGATTTAACTAGGCTTA 520 
AC C CCCCT G 

• • • • 

521 TTGGCAACTATACAGATT ATGCTGTACGCTGGTACAATAC 560 
A CCCCC TT C 

• • • • 

561 GGGATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGG 600 
T C G G C T T 

• • • • 

60 1 GTAAGGTATAATCAATTTAGAAGAGAATTAACACTAACTG 640 
ATACCGCG GCCA 

64 1 TATTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAG 680 
T G C T GT C C CTCC 
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• • • • 

681 AAGATATCCAATTCGAAC AGTTTCCCAATT AACAAGAGAA 720 
CCCTCT G CTC 

• • • • 

721 ATTTATACAAACCCAGTATTAGAAAATTTTGATGGTAGTT 7 60 
C T TCTGCCC C 

• ■ • • 

761 TTCGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAG 800 
CTTTCATC G CTCC C 

• • • • 

801 TCC AC ATTTGATGGATATACTTAACAGT AT AACCATCTAT 840 
C C CCTG C T C 

• • • • 

841 ACGGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATC 880 
C CAAGG C TAC 

881 AAATAATGGCTTCTCCTGTAGGGTTTTCGGGGCCAGAATT 920 
G C C ATA CAGC C G 

• • * • 

921 CACTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCA 9 60 
T C T C C C 

• • ■ • 

. 961 CAACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATA 1000 
C .TCC 

• • • ■ 

1001 GAACATTATCGTCCACTTTATATAGAAGACCTTTTAATAT 1040 
CGT CGC CC 

1041 AGGGATAAATAATC AAC AACTATCTGTTCTTGACGGGAC A 1080 
CTCCCG TC A 

• • • • 

1081 GAATTTGCTTATGGAACCTCCTCAAATTTGCCATCCGCTG 1120 
G C C T T C 

• « • • 

1121 TATACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAAT 1160 
T G C T CT C 

1161 ACCGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTT 1200 
C A C T C C 

• • • • 

1201 AGTCATCGATTAAGCCATGTTTCAATGTTTCGTTCAGGCT 1240 
TCC CA G G CGC C CA 

• • • • 

1241 TTAGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTT 1280 
C C C TCC G C C C 

• • • • 

1281 CTCTTGGAT AC ATCGTAGTGCTGAATTTAAT AAT AT AATT 1320 

C G C C C C C 

• • • . • 

1321 GCATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAA 13 60 
C 

• • • • 

13 61 ACTTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATT 1400 

c c c c 
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• • • • 

1401 TACTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAAT 1440 
C ACC CCCC 

1441 AACATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACT 1480 



• • • * 

1481 TCCCATCGACATCTACCAGATATCGAGTTCGTGTACGGTA 1520 
C A GA 

• # • • 

1521 TGCTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGT 1560 

G T 

• . • • 

1561 AATTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTA 1600 

C C T 

1601 CGTCATTAGATAATCTACAATCAAGTGATTTTGGTTATTT 1640 
CCG C CC C C 

. . • • • 

1641 TGAAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATA 1680 

C C C C 

1681 GTAGGTGTT AGAAATTTT AGTGGG ACTGC AGGAGTG AT AA 1720 
G C T 

1721 TAGACAGATTTGAATTTATTCCAGTTACTGCAACACTCGA 1760 
C C G C 

1761 GGCTGAA 1767 
G 
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• • • • 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• • • • 

41 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 80 
C C G A T C T 

« « • • 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • • 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATATGGGGAAT 200 
GCTCC CCC T 

« • • • . 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

241 GAAC AGTTAATT AACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC G GC.G C 

• • • • 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

• • • • 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

• • • • 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G A 

• • • * 

401 TCAATGACATGAACAGTGCCCTT ACAACCGCTATTCCTCT 4 40 
C CTGCA CAT 

• • • * 

441 TTTTGCAGTTC AAAATT ATCAAGTTCCTC TTTTATCAGTA 480 
GC CGCC CGCG 

• • • • 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

• • • • 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

60 1 GGCAACTATAC AG ATT ATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 

641 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGT ATAATC AATTT AGAAG AG AATTAAC ACTAACTGTA 720 
TACCGCG GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 
G C T GT C C CTCC 

• • • * 

7 61 GATATCC AATTCGAAC AGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

801 TTATACAAACCC AGTATT AGAAAATTTTG ATGGTAGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGTATTAGGAGTC 880 
TTTCATC G CTCC C C 

• • • • 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T * C 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C C A AG G C TACG 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AG AATTC A 1000 
C* C ATA CAGC C G T 

• • • • 

10O1 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 104 0 
CTC C C 

• • • • 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAG A 1080 
C T C C 

• • • • 

1081 AC ATT ATCGTCC ACTTT ATATAG AAGACCTTTTAATAT AG 1120 
CGT CGC CC C 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

• • • • 

1161 ATTTGCTTATGGAACCTCCTC AAATTTGCC ATCCGCTGTA 1200 
G C C T T C T 

i • • • * 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C >C 

1241 CGCC AC AG AAT AAC AACGTGCC ACCTAGGC AAGG ATTT AG 1280 
A C T C CTC 

■ • • * 

1281 TC ATCG ATT AAGCC ATGTTTC AATGTTTCGTTCAGGCTTT 1320 
CCAGG CGC C CAC 

• • • * 

1321 AGTAATAGTAGTGTAAGTAT AATAAGAGCTCCTATGTTCT 1360 

C C TCC G C C C 

• • • • 

1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 
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1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
C 

1441 TTTCTTTTTAATGGTTCTGT AATTTC AGGACC AGG ATTTA 14 80 

c c c c c 

1481" CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

1521 CATTCAGAATAGAGGGTATATTG AAGTTCCAATTCACTTC 1560 



1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

• ♦ • • 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 164 0 
G T 

• • ♦ • 

1641 TTCATCCATTTTTTCCAATAC AGTACC AGCTACAGCTACG 1680 
C C T C 

• • • * 

1681 TCATTAGATAATCTACAATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C C C 

1761 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

• • • • 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 184 0 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1880 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 



« • • * 

1921 GTAACGGATTATC ATATTG ATCAAGTGTCCAATTT AGTTA 1960 



« • • ♦ 

1961 CGTATTT ATCGG ATGAATTTTGTCTGGATGAAAAGCGAGA 2000 



2001 ATTGTCCGAG AAAGTC AAAC ATGCG AAGCG ACTCAGTGAT 2040 



2041 GAACGC AATTT ACTCCAAGATTC AAATTTC AAAG AC ATTA 2080 



2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 
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• • • • 

2121 TACC ATCC AAGGAGGGG ATGACGTATTTAAAGAAAATTAC 2160 

2161 GTC ACACT ATC AGGTACCTTTGATG AGTGCTATCC AACAT 2200 

• « • • 

2201 ATTTGTATC AAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 

2241 TACCCGTTATC AATTAAGAGGGTATATCGAAGATAGTCAA 2280 

2281 GACTTAGAAAT CTATTTAATTCGCT ACAATGC AAAACATG 2320 

2321 AAACAGTAAATGTGCC AGGTAC.GGGTTCCTTATGGCCGCT 23 60 

■ • • • 

2361 TTC AGCCC AAAGTCC AATCGG AAAGTGTGGAGAGCCGAAT 2400 

• * • * 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 24 40 

• t t • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGC A 24 30 

2481 TCATTTCTCCTTAG ACATTG ATGTAGGATGTACAG ACTTA 2520 

2521 AATGAGGACCT AGGTGT ATGGGTGATCTTTAAGATTAAGA 25 60 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 6C0 

• • * * 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 64 0 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT * 2 630 

2681 TGGAATGGGAAAC AAAT ATCGTTTATAAAGAGGCAAAAGA 2720 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 27 60 

2761 TTACAAGCGGATACGAATATTGCC ATGATTCAT GCGGCAG 2800 

• • « • 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 
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• « • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • • 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

• • • * 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 

• * • • 

2961 CTTATCCTGCTGGAACGTGAAAGGGC ATGTAGATGTAGAA 3000 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

• • • • 

3081 TCGTGGCTATATCCTTCGTGTCACAGCGTACAAGGAGGGA 3120 

• • * • 

3121 TATGGAGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

3161 ATACAGACGAACTGAAGTTTAGCAACTGCGTAGAAGAGGA 3200 

3201 • AATCTATCCAAATAACACGGTAACGTGTAATGATTATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCGAGGATATAACGAAGCTCCTTCCGTACCAGCTGATTA 3320 

3321 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTGAATTTAAC AGAGGGTATAGGGATT 3400 

• ■ • • 

3401 ACACGCC ACTACC AGTTGGTTATGTGACAAAAGAATTAGA 3440 

3441 AT ACTTCCCAG AAACCG ATAAGGTATGGATTGAGATTGGA 3480 

• - 

3481 GAAACGGAAGG AACATTTATCGTGGACAGCGTGG AATTAC 3520 
3521 TCCTTATGGAGGAA 3534 
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• • * * 

1 ATGGATAACAATCCGAACATCAATGAATGCATTCCTTATA 40 
CCA C AC 

• • • • 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 
C C G A T C T 

• • • • 

81 AAGAATAGAAACTGGTTACACCCCAATCGATATTTCCTTG 120 
CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

• • • * 

161 CTGGATTTGTGTTAGGACTAGTTGATATAATA^TGGGGAAT 200 
G C TC C C C C T 

• • • • 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTAC AAATT 240 
C A T * C • G G 

• • • * 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

• • • * 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

• • • * 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC C C 

• • • « 

361 CCTACT AATCC AGC ATT AAGAGAAGAGATGCGTATTC AAT 400 
C TC CC C G A 

• « • • 

401 TCAATGACATG AACAGTGCCCTTACAACCGCTATTCCTCT 440 
C CTGCA CAT 

• • • • 

441 TTTTGCAGTTC AAAATT ATCAAGTTCCTCTTTTATCAGTA 4 80 
GC CGCC CGCG 

• • • • 

481 TATGTTCAAGCTGC AAATTTACATTTATC AGTTTTG AGAG 520 
C A T C T CC CAGC GC TC 

• ♦ • • 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • ■ 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

• • • • 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 
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• • * • 

64 1 GATTAGAACGTGTATGGGGACCGGATTCTAGAGATTGGGT 680 
C G G C T T A 

681 AAGGTATAATC AATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 7 60 
G C T GT C C CTCC 

• • • • 

7 61 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • • 

801 TTAT AC AAACCCAGTATTAGAAAATTTTG ATGGTAGTTTT 840 
C T TCTGCCC.CC 

841 CGAGGCTCGGCTCAGGGCATAGAAAGAAGT ATTAGGAGTC 880 
TTTCATC G CTCC C C 

• • • • 

881 CACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

• • • • 

92 1 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C TACG 

961 ATAATGGCTTCTCCTGT AGGGTTTTCGGGGCC AGAATTC A 1000 
C C ATA CAGC C G T 

1001 CTTTTCCGCTATATGGAACTATGGGAAATGCAGCTCCACA 1040 
CTC C C 

• 

1041 ACAACGTATTGTTGCTCAACTAGGTCAGGGCGTGTATAGA 1030 
C T C C 

1081 AC ATTATC GTC C ACTTT AT AT AG AAGACC TTTT AATAT AG 1120 
CGT CGC CC C 

1121 GGATAAAT AATC AAC AACTATCTGTTCTTGACGGG AC AGA 1160 
TCCCG TC A 

4 

1161 ATTTGCTTATGGAACCTCCTC AAATTTGCC ATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

1241 CGCCACAGAATAACAACGTGCCACCTAGGCAAGGATTTAG 1280 
A C T C CTC 

• 

1281 TC ATCGATTAAGCC ATGTTTC AATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CAC 

* * • 

1321 AGTAAT AGTAGTGT AAGTAT AAT AAGAGCTCCT ATGTTCT 1360 

C C TCC G C C C 
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1361 CTTGGATACATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C C C C 

• • • • 

1401 ATCGGATAGTATTACTCAAATCCCTGCAGTGAAGGGAAAC 1440 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTC AGG ACC AGGATTTA 1480 

c c c c c 

• • • • 

1481 CTGGTGGGGACTTAGTT AG ATTAAATAGT AGTGG AAATAA 1520 
ACC C C C C 

• • • • 

1521 CATTCAGAATAGAGGGTATATTGAAGTTCCAATTCACTTC 15 60 



1561 CCATCGACATCT ACCAG ATATCG AGTTCGTGTACGGTATG 1600 
C A GA 

• • • • 

1601 CTTCTGTAACCCCGATTCACCTC AACGTTAATTGGGGTAA 1640 
G T 

• • • • 

1641- TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

• • • • 

1681 TCATTAGATAATCTAC AATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

• • 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C C C 

17 61 AGGTGTTAGAAATTTT AGTGGG ACTGC AGG AGTGAT AAT A 1800 
G C T C 

1801 GACAGATTTGAATTTATTCCAGTTACTGCAACACTCGAGG 1840 
C G C 

1841 CTGAATATAATCTGGAAAGAGCGC AGAAGGCGGTG AATGC 1880 



1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1920 

G C C C G "C ; 

• * 

1921 GTAAC GGATTATC ATATTG ATCAAGTGTCCAATTT AGTTA 1960 
G C G G 

• • • • 

1961 CGTATTTATCGGATG AATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC CAGC G C 

• • ♦ • 

2001 ATTGTCCG AG AAAGTCAAACATGCGAAGCGACTC AGTGAT 2040 

2041 GAACGC AATTT ACTCCAAGATTC AAATTTCAAAG AC ATTA 2080 
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2081 ATAGGCAACCAGAACGTGGGTGGGGCGGAAGTACAGGGAT 2120 



2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
G TC GCGGC 

• • • • 

2 161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 



2201 ATTTGTATCAAAAAATCGATGAATCAAAATTAAAAGCCTT 2240 
CCCCGG CGCGG 

• • • • 

2241 TACCCGTTAJC AATTAAGAGGGTATATCG AAGATAGTC AA 2280 



2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAACATG 2320 
. C C G CC C C 

2321 AAACAGTAAATGTGCCAGGTACGGGTTCCTTATGGCCGCT 2360 

• • • • 

2361 TTCAGCCC AAAGTCC AATCGGAAAGTGTGGAGAGC CGAAT 2400 

2401 CGATGCGCGCC ACACCTTGAATGGAATCCTGACTTAGATT 24 40 

• • • « 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 2480 

« • * • 

2481 TCATTTCTCCTTAGAC ATTGATGTAGGATGTACAGACTTA 2520 

• i • • 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 2 560 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 

• • • * 

2601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2640 

2641 AAAAGAGCGGAGAAAAAATGGAGAGACAAACGTGAAAAAT 2680 

G G 

2681 TGGAATGGGAAACAAATATCGTTTATAAAGAGGCAAAAGA 2720 
G C C C C 

• • * . * 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 
27 61 TTACAAGCGGATACGAATATTGCCATGATTCATGCGGCAG 2800 
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« • • • 

2801 ATAAACGTGTTCATAGCATTCGAGAAGCTTATCTGCCTGA 2840 

• • • • 

2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 

• • • * 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2920 

C C 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 
C C. CGC CCC 

• ■ • • 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 

• * * • 

3001 GAAC AAAACAAC C AACGTTCGGTCCTTGTTGTTCCGGAAT 3040 

« • • • 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 

3081 TCGTGGCTATATCCTTCGTGTCAC AGCGT ACAAGG AGGG A 3120 

3121 TATGG AGAAGGTTGCGTAACCATTCATGAGATCGAGAACA 3160 

• • • • } 

3161 ATAC AGACGAACTGAAGTTTAGC AACTGCGTAGAAGAGGA 3200 

• • • ■ 

3201 AATCT ATCCAAATAAC ACGGTAACGTGTAATGATT ATACT 3240 

3241 GTAAATCAAGAAGAATACGGAGGTGCGTACACTTCTCGTA 3280 

3281 ATCG AGGATAT AACG AAGCTCCTTCCGT ACCAGCTGATT A 3320 

332 1 TGCGTCAGTCTATGAAGAAAAATCGTATACAGATGGACGA 3360 

3361 AGAGAGAATCCTTGTG AATTTAACAGAGGGTATAGGGATT 3400 

3401 ACACGCCACTACCAGTTGGTTATGTGAC AAAAGAATTAG A 3440 

■ • • • 

3441 ATACTTCCCAGAAACCGATAAGGTATGGATTGAGATTGGA 3480 

3481 GAAACGGAAGG AACATTTATCGTGGAC AGCGTGG AATT AC 3520 

3521 TCCTTATGG AGG AA 3534 
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• • • • 

1 ATGGATAACAATCCGAACATC AATGAATGC ATTCCTTAT A 4 0 
CCA C AC 

4 1 ATTGTTTAAGTAACCCTGAAGTAGAAGTATTAGGTGGAGA 8 0 ' 
C C G A T C T 

8 1 AAGAATAGAAACTGGTTACACCCC AATCGATATTTCCTTG 120 
CCT C TC CC 

• • • • 

121 TCGCTAACGCAATTTCTTTTGAGTGAATTTGTTCCCGGTG 160 
CTGAG GCCCGCGA 

161 CTGGATTTGTGTTAGGACTAGTTG ATAT AATATGGGGAAT- 200 
GCTCC CCC T 

• ■ « • 

201 TTTTGGTCCCTCTCAATGGGACGCATTTCTTGTACAAATT 240 
C A T C G G 

241 GAACAGTTAATTAACCAAAGAATAGAAGAATTCGCTAGGA 280 
G GC GGC G C 

281 ACCAAGCCATTTCTAGATTAGAAGGACTAAGCAATCTTTA 320 
G C G G T G C 

321 TCAAATTTACGCAGAATCTTTTAGAGAGTGGGAAGCAGAT 360 
C C T GAGC ' C C 

• « • • 

361 CCTACTAATCCAGCATTAAGAGAAGAGATGCGTATTCAAT 400 
C TC CC C G ' A 

401 TCAATG ACATGAACAGTGCCCTTACAACCGCTATTCCTCT 4 40 
C CTGCA CAT 

• • • • 

441 TTTTGCAGTTCAAAATTATCAAGTTCCTCTTTTATCAGTA 480 
GC CGCC CGCG 

481 TATGTTCAAGCTGCAAATTTACATTTATCAGTTTTGAGAG 520 
C A T C T CC CAGC GC TC 

521 ATGTTTCAGTGTTTGGACAAAGGTGGGGATTTGATGCCGC 560 
C AGC G C T 

• • • » 

561 GACTATCAATAGTCGTTATAATGATTTAACTAGGCTTATT 600 
AC C CCCCT G 

601 GGCAACTATACAGATTATGCTGTACGCTGGTACAATACGG 640 
A CCCCC TT CT 

• • • • 

641 GATTAG AACGTGT ATGGGG ACCGG ATTCTAGAGATTGGGT 680 
C G G C T T A 
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681 AAGGTATAATCAATTTAGAAGAGAATTAACACTAACTGTA 720 
TACCGCG GCCAT 

• ■ • • 

721 TTAGATATCGTTGCTCTGTTCCCGAATTATGATAGTAGAA 760 
G C T GT C C CTCC 

• • • • 

761 GATATCCAATTCGAACAGTTTCCCAATTAACAAGAGAAAT 800 
CCCTCT G CTC 

• • • • 

801 TTATAC AAACCCAGTATT AGAAAATTTTG ATGGTAGTTTT 840 
C T TCTGCCC CC 

841 CGAGGCTCGGCTCAGGGC ATAGAAAGAAGTATTAGG AGTC 880 
TTTCATC G CTCC C C 

• • • • 

881 C ACATTTGATGGATATACTTAACAGTATAACCATCTATAC 920 
C C CT G C T C 

• • • • 

921 GGATGCTCATAGGGGTTATTATTATTGGTCAGGGCATCAA 960 
C CAAGG C T A C G 

961 ATAATGGCTTCTCCTGTAGGGTTTTCGGGGCC AG AATTC A 1000 
C C ATA CAGC C G T 

• • • • 

1001 CTTTTCCGCTAT ATGGAACT ATGGG AAATGC AGCTC C AC A 1040 
CTC C C 

• • # • 

1041 AC AACGTATTGTTGCTC AACT AGGTC AGGGCGTGT AT AGA 1080 
C T C C 

• • 
1081 AC ATTATCGTCCACTTTAT AT AGAAG ACCTTTT AAT AT AG 1120 
CGT CGC CC C 

• • • • 

1121 GGATAAATAATCAACAACTATCTGTTCTTGACGGGACAGA 1160 
TCCCG TC A 

• • • • 

1161 ATTTGCTT ATGG AACCTCCTC AAATTTGCC ATCCGCTGTA 1200 
G C C T T C T 

1201 TACAGAAAAAGCGGAACGGTAGATTCGCTGGATGAAATAC 1240 
G C T CT C C 

1241 CGCC AC AGAATAACAACGTGCCACCTAGGC AAGGATTT AG 1280 
A C T C CTC 

1281 TCATCGATTAAGCCATGTTTC AATGTTTCGTTC AGGCTTT 1320 
CCAGG CGC C CAC 

• * • • 

1321 AGTAATAGTAGTGTAAGTATAATAAGAGCTCCTATGTTCT 1360 
C C TCC G C C C 

• • V • 

1361 CTTGGATAC ATCGTAGTGCTGAATTTAATAATATAATTGC 1400 
C G C C- C C C 
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1401 ATCGGATAGTATT ACT C AAATCCCTGC AGTGAAGGGAAAC 1440 
C 

• • • • 

1441 TTTCTTTTTAATGGTTCTGTAATTTCAGGACCAGGATTTA- 1480 

c c c c c 

• • • • 

1481 CTGGTGGGGACTTAGTTAGATTAAATAGTAGTGGAAATAA 1520 
A C C C C C C 

, • • • 

1521 CATTCAGAATAG AGGGTATATTGAAGTTCC AATTCACTTC 1560 



1561 CCATCGACATCTACCAGATATCGAGTTCGTGTACGGTATG 1600 
C A GA 

• • • * 

1601 CTTCTGTAACCCCGATTCACCTCAACGTTAATTGGGGTAA 1640 

G T 

1641 TTCATCCATTTTTTCCAATACAGTACCAGCTACAGCTACG 1680 
C C T C 

1631 TCATTAGATAATCTAC AATCAAGTGATTTTGGTTATTTTG 1720 
C G C C C C C 

. • • * 

1721 AAAGTGCCAATGCTTTTACATCTTCATTAGGTAATATAGT 17 60 

C C C C 

« • • • 

17 61 AGGTGTTAGAAATTTTAGTGGGACTGCAGGAGTGATAATA 1800 
G C T C 

1801 GAC AGATTTGAATTTATTCCAGTT ACTGCAACACTCGAGG 1840 
C G C 

• • • • 

1841 CTGAATATAATCTGGAAAGAGCGCAGAAGGCGGTGAATGC 1 88 0 
GCCTG C T C 

1881 GCTGTTTACGTCTACAAACCAACTAGGGCTAAAAACAAAT 1 920 
CC CCCTGTCTG TC 

♦ • • * 

1921 GTAACGGATTATCATATTGATCAAGTGTCCAATTTAGTTA 1960 
TTC C C CGC 

1961 CGTATTTATCGGATGAATTTTGTCTGGATGAAAAGCGAGA 2000 
C CC TAGC G C C C C G T 

♦ • • 

2001 ATTGTCCGAGAAAGTCAAACATGCGAAGCGACTCAGTGAT 2040 
CC T CC T CC 

2041 GAACGC AATTT ACTCCAAGATTCAAATTTC AAAGAC ATTA 2080 
GAGCCTG CCC C 

* 

2081 ATAGGC AACCAG AACGTGGGTGGGGCGGAAGT AC AGGG AT 2120 
C G T T C C 
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2121 TACCATCCAAGGAGGGGATGACGTATTTAAAGAAAATTAC 2160 
C CCTGCGGC 

2161 GTCACACTATCAGGTACCTTTGATGAGTGCTATCCAACAT 2200 
CCCATCC CTC 

• • • . 

2201 ATTTGT ATCAAAAAATCG ATGAATCAAAATTAAAAGCCTT 2240 
C CGG GCCC 

2241 TACCCGTTATCAATTAAGAGGGTATATCGAAGATAGTCAA 2280 
' C AG CT CC CC 

■ • • • 

2281 GACTTAGAAATCTATTTAATTCGCTACAATGCAAAAC ATG 2320 
CT CCGCAG CGC 

• • • • 

2321 AAACAGTAAATGTGCC AGGTACGGGTTCCTTATGGCCGCT 2360 
GCG C T CC A 

23 61 TTCAGCCCAAAGTCCAATCGGAAAGTGTGGAGAGCCGAAT 2400 
T TC C T G T C 

• • • • 

2401 CGATGCGCGCCACACCTTGAATGGAATCCTGACTTAGATT 2440 
A T G G C 

• • • • 

2441 GTTCGTGTAGGGATGGAGAAAAGTGTGCCCATCATTCGCA 24 80 
C C C C G C T 

• • • • 

2481 TCATTTCTCCTT AGACATTGATGT AGGATGTAC AGACTTA 2520 
C GCG T C G 

• • • * 

2521 AATGAGGACCTAGGTGTATGGGTGATCTTTAAGATTAAGA 25 60 
C A C C C C 

• • ■ • 

2561 CGCAAGATGGGCACGCAAGACTAGGGAATCTAGAGTTTCT 2 600 
C C A T C C T 

• • • • 

2 601 CGAAGAGAAACCATTAGTAGGAGAAGCGCTAGCTCGTGTG 2 640 

G C T T C 

2641* AAAAGAGCGGAG AAAAAATGGAGAGACAAACGTG AAAAAT 2 680 
G A G G G G-y C 

2681 TGGAATGGGAAACAAAT ATCGTTT ATAAAGAGGC AAAAGA 2720 
C T C CGC 

2721 ATCTGTAGATGCTTTATTTGTAAACTCTCAATATGATCAA 2760 
GCG GCG C G 

2761 TTACAAGCGGATACGAATATTGCC ATGATTCATGCGGCAG 2800 
G CCCCC CCC. 

2801 ATAAACGTGTTC ATAGC ATTCGAGAAGCTTATCTGCCTGA 2840 
C G C T G CT 
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2841 GCTGTCTGTGATTCCGGGTGTCAATGCGGCTATTTTTGAA 2880 
T C CT GCTCCCG 

2881 GAATTAGAAGGGCGTATTTTCACTGCATTCTCCCTATATG 2 920 
CTGA CTC TGC 

2921 ATGCGAGAAATGTCATTAAAAATGGTGATTTTAATAATGG 2960 
C C CGC CCC 

2961 CTTATCCTGCTGGAACGTGAAAGGGCATGTAGATGTAGAA 3000 
C CAG T T G C G G 

3001 GAACAAAACAACCAACGTTCGGTCCTTGTTGTTCCGGAAT 3040 
G TG C G GTG 

3041 GGGAAGCAGAAGTGTCACAAGAAGTTCGTGTCTGTCCGGG 3080 
T C G A A A 

< * • • 

3081 TCGTGGCTAT ATCCTTCGTGTC AC AGCGT AC AAGG AGGGA 3120 
A A CTC GCT 

• • • • * 

3121 TATGG AGAAGGTTGCGTAACCATTCATG AG ATCG AG AAC A 3160 
C T G G C C 

• * • • 

3161 ATAC AG ACGAACTGAAGTTTAGCAACTGCGTAGAAG AGGA 3200 
C C G T CTC C G A 

• • • • 

3201 AATCTATCC AAAT AACAC GGTAAC GTGTAATGATTATACT 3240 
CC CTTCCCC 

• • • • 

3241 GTAAATC AAG AAG AATACGGAGGTGCGT AC ACTTCTCGT A 3280 
G G G ' C AGC 

• • • • 

3281 ATCGAGGATATAACGAAGCTCCTT CCGTACCAGCTG ATT A 3320 
CA T C T T C 

• • • * 

3321 TGCGTC AGTCTATGAAGAAAAATCGTATAC AGATGG ACG A 3360 
C C G C G G CC CA 

3361 AGAGAGAATCCTTGTGAATTTAACAGAGGGTATAGGGATT 3400 
CT C CGC TC C 

3401 ACACGCCACTACCAGTTGGTTATGTGACAAAAGAATTAGA 3440 
A T C TCGGCT 

3441 ATACTTCCC AGAAACCGATAAGGTATGGATTGAGATTGGA 3480 
G TTG CAG C CT 

3481 GAAACGGAAGGAACATTTATCGTGGACAGCGTGGAATTAC 3520 
C G C C GC T 

3521 TCCTTATGGAGGAA 3534 
T G 
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1 ATGACTGCAGAT AATAATACGGAAGCACTAGATAGCTCTA 4 0 
CCCC CCCT 

• • • ■ 

4 1 CAACAAAAGATGTCATTCAAAAAGGCATTTCCGTAGTAGG 80 
CTG TCGGTC .TG 

• • • • 

8 1 TGATCTCCTAGGCGTAGTAGGTTTCCCGTTTGGTGGAGCG 120 
AC T G GTATCC C 

• • • * 

121 CTTGTT TCGTTTTATACAAACTTTTT AAATACTATTTGGC 160 
C GAGC C CCCC 

• • • • 

1 61 CAAGTGAAGACCCGTGGAAGGCTTTTATGGAACAAGTAGA 200 
CG T AAC G T 

« • * * 

201 AGC ATTGATGGATC AGAAAAT AGCTGATTATGCAAAAAAT 240 
TCT G TA CGC 

• • • • 

241 AAAGCTCTTGCAGAGTTACAGGGCCTTCAAAATAATGTCG 280 
GTG ACC GC G 

281 AAGATTATGTGAGTGCATTGAGTTCATGGCAAAAAAATCC 320 
G C C TCCAGC G G C 

• • " • 

321 TGTGAGTTCACGAAATCCACATAGCCAGGGGCGGATAAGA 3 60 
T C CA T C A TA C 

• • * • 

361 GAGCTGTTTTCTC AAGC AGAAAGTCATTTTCGT AATTCAA 400 
T C C TCC C CA A C 

• • • • 

401 TGCCTTCGTTTGCAATTTCTGGATACG AGGTTCTATTTCT 440 
AGC T . C C T T C 

• « • • 

441 AACAAC ATATGC ACAAGCTGCCAACACACATTTATTTTTA 480 
CTC T CCGCC 

• • • • 

481 CTAAAAGACGCTCAAATTTATGGAGAAGAATGGGGATACG 520 
T G C G 

• • • * 

521 AAAAAGAAGATATTGCTGAATTTTATAAAAGACAACTAAA 560 
G GC GCCGCT T 

. . • • 

561 ACTTACGCAAGAATATACTGACCATTGTGTCAAATGGTAT 600 

G C C G C C G 

♦ • 

601 AATGTTGGATT AGAT AAATTAAGAGGTTC ATCTTATGAAT 640 
C TCC GCC CTCCG 

• • • • 

641 CTTGGGTAAACTTTAACCGTTATCGCAGAGAGATGACATT 580 
G C A A CA G C 
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681 


AACAGTATTAGATTTAATTGCACTATTTCCATTGTATGAT 
GTGCCCTC C C C 


720 


721 


GTTCGGCTATACCCAAAAGAAGTTAAAACCGAATTAACAA 
GAAC G G TGCTC 


7 60 


761 


GAGACGTTTTAACAGATCCAATTGTCGGAGTCAACAACCT 
GC C T C T 


800 


801 


TAGGGGCTATGGAACAACCTTCTCTAATATAGAAAATTAT 
T T AGC C C C 


840 


841 


. 

ATTCGAAAACCACATCTATTTGACTATCTGCATAGAATTC 
AG C C T C 


880 


881 


AATTTCACACGCGGTTCCAACCAGGATATTATGGAAATGA 
C AA T C T C 


920 


921 


CTCTTTCAATTATTGGTCCGGTAATTATGTTTCAACTAGA 

c c c c c 


960 


961 


CCAAGCATAGGATCAAATGATATAATCACATCTCCATTCT 
T T C C C 


1000 


1001 

X V \J i 


ATGGAAATAAATCCAGTGAACCTGTACAAAATTTAGAATT 
T C G GGCCTG 


1040 


1041 


TAATGGAGAAAAAGTCTATAGAGCCGTAGCAAATACAAAT 
C C C G C C C 


1080 


1081 


CTTGCGGTCTGGCCGTCCGCTGTATATTCAGGTGTTACAA 
CTG A AT.C C C * 


1120 


1121 


AAGTGGAATTTAGCCAATATAATGATCAAACAGATGAAGC 
G G TG C GC G 


1160 


1161 

X A U X 


• • • 
AAGTACACAAACGTACGACTCAAAAAGAAATGTTGGCGCG 

CCCGT CCTC A 


1200 


1201 


GTCAGCTGGGATTCTATCGATCAATTGCCTCCAGAAACAA 
TCT C c 


1240 


1241 


• 

CAGATGAACCTCTAGAAAAGGGATATAGCCATCAACTCAA 
C AT GG CC C T 


1280 


1281 


TTATGT AATGTGCTTTTTAATuUAwjw i ft« x /wAwnnvn 

C G C G A TCC G C 


1320 


1321 


• 

ATCCCAGTGTTAACTTGGACACATAAAAGTGTAGACTTTT 
T G C C GTCC G C 


1360 


1361 


TTAACATGATTGATTCGAAAAAAATTACACAACTTCCGTT 
C C AGC G G C T C 


1400 
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1401 AGTAAAGGCATATAAGTT AC AATCTGGTGCTTCCGTTGTC 1440 
G G A C C C G 

• • • • 

1441 GCAGGTCCTAGGTTTACAGGAGGAGATATCATTCAATGCA 1480 
CACT TC CG 

• « • • 

1481 C AGAAAATGGAAGTGCGGC AACTATTTACGTTACAC CGG A 1520 
GCCCAT C G T 

• t • • 

1521 TGTGTCGTACTCTCAAAAATATCGAGCTAGAATTCATTAT 1560 
T G G CA G AC T C 

• • • • 

1561 GCTTCTACATCTCAGATAACATTTACACTCAGTTTAGACG 1600 
A CAGC C C C C G T 

• • • • 

1601 GGGCACCATTTAATCAATACTATTTCGATAAAACGATAAA 1640 
A CCCGTCTCGCC 

• • • • 

1641 TAAAGGAGACAC ATT AACGTATAATTC ATTTAATTT AGCA 1680 
C T TC C A C AGC C C G 

• • • • 

1 681 AGTTTCAGCACACCATTCGAATTATCAGGGAATAACTTAC 1720 

T C C C C TC T 

» • • • 

1721 AAAT AGGCGTC AC AGG ATTAAGTGCTGGAG ATAAAGTTTA 17 60 
GC CTCCCC C C 

• • • 

17 61 TATAGACAAAATTGAATTTATTCCAGTGAAT 1791 
C C G G C C C 
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1 ATG AATAATGTATTGAATAGTGGAAGAACAACTATTT 40 
GAC C C C CTC T C C 

• • • * 

41 GTGATGCGTATAATGTAGTAGCCCATGATCCATTTAGTTT 80 
CCACCCGTC CC 

• • • < 

81 TGAACATAAATCATTAGATACCATCCAAAAAGAATGGATG 120 
C C GAGCC C C T T G G G 

• • • • 

121 GAGTGGAAAAGAACAGATCATAGTTTATATGTAGCTCCTG 160 
A C T T C CTC C C C C A 

• • « • 

161 TAGTCGGAACTGTGTCTAGTTTTTTGCTAAAGAAAGTGGG 200 
GT A CCCCTC GC 

• • • • 

201 G AGTCTTATTGG AAAAAGGATATTG AGTG AATTATGGGGG 240 
CTC C C CTC TCC C C T 

• • • • 

241 ATAATATTTCCTAGTGGTAGTACAAATCTAATGCAAGATA 280 
. C C ATC GTCC T C C 

• • • • 

281 TTTTAAGGGAGACAGAACAATTCCTAAATCAAAGACTTAA 320 
CG C GTCCGCTC 

• • • • 

321 TACAGATACCCTTGCTCGTGTAAATGCAGAATTGATAGGG 360 
CT TG AACCTG CT 

• • • « 

361 CTCCAAGCGAATATAAGGGAGTTTAATCAACAAGTAGATA 400 
ACTCT CCG GC 

• • • • 

4 01 ATTTTTTAAACCCTACTCAAAACCCTGTTCCTTTATCAAT 4 40 
CCGTA GT G CTC 

• • * • • 

4 41 AACTTCTTCGGTTAATACAATGCAGCAATTATTTCTAAAT 4 80 
C CGCT CCCCC 

• • • • 

481 AGATTACCCCAGTTCC AGATAC AAGGATACC AGTTGTTAT 520 
G T T T C • C CC 

• • • • 

521 TATTACCTTTATTTGCACAGGCAGCCAATATGCATCTTTC 560 
TC T AC C T T C CT G 

• • * * 

561 TTTTATTAGAGATGTTATTCTTAATGCAGATGAATGGGGT 600 
CCACTCGCCCTC A 

601 ATTTCAGCAGC AAC ATT ACGT ACGT ATCGAG ATTACCTG A 640 
C T C TC TA G A CA C T 

641 G AAATTAT AC AAG AG ATTATTCT AATTATTGTAT AAAT AC 680 
GCCTCT CCC CCC 
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681 GTATCAAACTGCGTTTAGAGGGTTAAACACCCGTTTACAC 720 
T G C C T AC C T TA GC T 

■ • • • 

721 GATATGTTAGAATTTAGAACATATATGTTTTTAAATGTAT 7 60 
C CTGCGCC CCTCG 

761 TTGAATATGTATCCATTTGGTCATTGTTTAAATATCAGAG 800 
G C CAG AGTC C C G C 

801 TCTTATGGTATCTTCTGGCGCTAATTTATATGCTAGCGGT 840 
CTG GC ACCCC CTCT C 

841 AGTGGACCACAGCAGACACAATCATTTACAGCACAAAACT * 880 

A T GAGC C T G 

• ■ • • • 

881 GGCCATTTTTATATTCTCTTTTCCAAGTTAATTCGAATTA 920 
C G AGCT G C C C C. 

• « • • 

921 TATATTATCTGGT ATTAGTGGTACTAGGCTTTCTATTACC 960 
C TC CAG CTC G C A C C A 

• • « t 

961 TTCCCTAATATTGGTGGTTTACCGGGTAGTACTACAACTC 1000 
T C C AC T A CTCC C 

• • • • 

1001 ATTCATTGAAT AGTGCCAGGGTTAATTATAGCGGAGG AGT 1040 
AGCC T CTC A G C C T T 

■ • • • 

1041 TTCATCTGGTCTC ATAGGGGCGACT AATCTC AATCACAAC 1080 
CAGC AT G T T A CT G C 

1081 TTTAATTGCAGCACGGTCCTCCCTCCTTTATCAACACCAT 1120 
C TC C T G A C GAGC G 

1121 TTGTTAGAAGTTGGCTGGATTCAGGTACAGATCGAGAGGG 1 1 60 
G GTCC T CAGC T C A 

1161 CGTTGCTACCTCTACGAATTGGCAGACAGAATCCTTTCAA 1200 
A A C A C G C 

1201 ACAACTTT AAGTTTAAGGTGTGGTGCTTTTTC AGCCCGTG 1240 
C C T CC TC A C T A 

1241 GAAATTC AAACTATTTCCC AGATTATTTTATCCGTAATAT 1280 
G CT CCCTAGC 

1281 TTCTGGGGTTCCTTTAGTTATTAGAAACGAAGATCTAACA 1320 
C T CCCCGT CCC 

1321 AGACCGTTACACTATAACCAAATAAGAAATATAGAAAGTC 13 60 
CTACTTC GTGCC GTC 

1361 CTTCGGGAACACCTGGTGGAGCACGGGCCTATTTGGTATC 1400 
ACTTAAT AATCCCG 
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1401 TGTGCATAACAGAAAAAATAATATCTATGCCGCTAATGAA 1440 
C GGCC CTCCG 

• • • • 

1441 AATGGTACTATGATCC ATTTGGCGCCAGAAG ATTATACAG 1480 
CC TCCTA . C T 

• • * • 

1481 GATTTACTATATCGCCAATACATGCCACTCAAGTGAATAA 1520 
CCCT C TC C 

• • • • 

1521 TCAAACTCGAACATTTATTTCTGAAAAATTTGGAAATCAA 1560 
GACCCCC GC 

15 61 GGTGATTCCTTAAGATTTGAACAAAGCAACACGACAGCTC 1600 
C GGCGTC T C A * 

• • • • 

1601 GTTAT ACGCTTAG AGGGAATGGAAATAGTTACAATCTTTA 1640 
GCTTG C CC C 

• • * • 

1641 TTTAAGAGTATCTTCAATAGGAAATTCAACTATTCGAGTT 1 68 0 
C G TAGC CTTCCCCT 

• • • • 

1681 ACTATAAACGGTAGAGTTTATACTGTTTCAAATGTTAATA 1720 
CC ACT CACT GC 

1721 CCACTACAAATAACGATGGAGTTAATGATAATGGAGCTCG 17 60 
TAGCT C CCC CA 

1761 TTTTTCAGATATTAATATCGGT AAT ATAGTAGC AAGTG AT 1800 
A CAGC CCCTCCCG CTC C 

• • • • 

1801 AATACTAATGTAACGCTAGATATAAATGTGACATTAAACT 1840 
C CTTTGCC CCCT 

• • • • 

1841 CCGGTACTCC ATTTGATCTC ATG AATATTATGTTTGTGCC 1 88 0 
T A C C 

1881 AACTAATCTTCCACCACTTTAT 1902 
C C T T G C 
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• • • • 

1 ATGGAGGAAAATAATCAAAATCAATGCATACCTTACAATT 40 
G C C C T A C 

4 1 GTTTAAGTAATCCTGAAG AAGTACTTTTGGATGGAG AACG 8 0 
CG CA GTGCT 

• • • • 

8 1 GATATC AACTGGT AATTC ATCAATTGAT ATTTCTCTGTC A 120 
CT C CTCCCCCT C 

• • • « 

121 CTTGTTCAGTTTCTGGTATCTAACTTTGT ACCAGGGGGAG 1 60 
T G C CAGC C G T T 

• • • • 

161 GATTTTTAGTTGGATTAATAGATTTTGTATGGGGAATAGT 200 
GCCTCC TCCC TC 

• < • • • 

201 TGGCCCTTCTCAATGGGATGCATTTCTAGTACAAATTGAA 240 
T A C G G G 

• • • • 

241 C AATT AATT AATGAAAGAATAGCTGAATTTGCTAGG AATG 280 
GGCCGGC GCC C 

• • • • 

281 CTGCTATtGCT AATTT AGAAGGAT TAGGAAAC AATTTCAA 320 
CC CG GCTC 

321 TAT ATATGTGG AAGC ATTTAAAGAATGGGAAGAAG ATCCT 360 
CC GCC G GC 

361 AATAATCCAGAAACCAGGACCAGAGTAATTGATCGCTTTC 400 
C G CCTGGCCAACA 

401 GTAT ACTTGATGGGCT ACTTGAAAGGGACATTCCTTCGTT 440 
ACTGCCCTGGATCAC 

441 TCGAATTTCTGG ATTTG AAGT ACCCCTTTT ATCCGTTTAT 480 
CA C CC TTCG GC 

481 GCTCAAGCGGCC AATCTGCATCTAGCTATATTAAGAGATT 520 
AT T C C CC TC CA 

• • • • 

521 CTGTAATTTTTGGAGAAAGATGGGGATTGACAACGATAAA 560 
GCC G G CTC 

• • • • 

561 TGTCAATGAAAACTATAAT AGACT AATTAGGCATATTGAT 600 
C G TCC TC C C 

• • * * 

601 GAATATGCTGATCACTGTGCAAATACGTATAATCGGGGAT 64 0 
GCCC TCCCCTC 

• t • • 

641 TAAATAATTTACCGAAATCTACGTATCAAG ATTGGATAAC 680 
GCCCTG T T 

• • • • 

681 ATATAATCGATT ACGGAGAGACTTAACATTGACTGTATTA 720 
C C CA G GA G CC C A T G 
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/ • • • • 

721 GATATCGCCGCTTTCTTTCCAAACTATGACAATAGGAGAT 760 
C T A C G C 

• • • • „ 

761 ATCCAATTCAGCCAGTTGGTCAACTAACAAGGGAAGTTTA 800 
CTCA G TCA C 

• • • • 

801 T ACGG ACCC ATT AATTAATTTTAATCC AC AGTTAC AGTCT 840 
T CT CCCT G AAG 

• • • • 

841 GTAGCTCAATTACCTACTTTTAACGTTATGGAGAGCAGCC 880 
CCCTCAC C TC 

. • • • 

881 GAATTAGAAATCCTCATTtATTTGATATATTGAATAATCT 920 
TCGCACG CC CC 

• • • * 

921 TACAATCTTTACGGATTGGTTTAGTGTTGGACGCAATTTT 960 
T C C . CC GTCC 

• • • • 

961 TATTGGGGAGGACATCGAGTAATATCTAGCCTTATAGGAG 1000 
T CA G C C CTCT * T 

• # ♦ • 

1001 GTGGTAAC ATAAC ATCTCCTATAf ATGGAAGAGAGGCGAA 1040 
G T C C C T A 

1041 CCAGGAGCCTCCAAGATCCTTTACTTTTAATGGACCGGTA 1080 
A C TAGT C C C C T A C 

• • • • 

1081 TTTAGG ACTTTATC AAATCCTACTTT ACGATTATTAC AGC 1120 
CACGTC CGA GCC. 

1121 AACCTTGGCCAGCGCCACCATTTAATTT ACGTGGTGTTG A 1160 

T T C CC TA A 

1161 AGGAGT AGAATTTTCTAC ACCTAC AAATAGCTTT ACGTAT 1200 
G C T G C T C CTC C T C 

• • • • 

1201 CGAGGAAGAGGTACGGTTG ATTCTTTAACTGAATTACCGC 1240 
A T AC CGCCCA 

1241 CTGAGG AT AAT AGTGTGCCACCTCGCGAAGG ATATAGTC A 1280 
A C C CA G C CTCC 

• • • • 

1281 TCGTTTATGTCATGCAACTTTTGTTCAAAG ATCTGGAACA 1320 
CAGG-CC CCGGCTC T 

• • • • 

1321 CCTTTTTT AAC AACTGGTGTAGTATTTTCTTGGACCGATC 1360 
ACCCTAATGCA T 

• • • ♦ 

1361 GTAGTGCAACTCTTACAAATACAATTGATCCAGAGAGAAT 1400 
T C T C C G 
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• • • • 

1401 TAATCAAATACCTTTAGTGAAAGG ATTTAG AGTTTGGGGG 1440 
C CAGCGTCCTG A 

1441 GGCACCTCTGTCATTAC AGGACC AGGATTT ACAGGAGGGG 1480 
AT C C C T 

• • • • 

1481 ATATCCTTCGAAGAAATACCTTTGGTGATTTTGTATCTCT 1520 
T A C T C C GAGC 

1521 ACAAGTC AAT ATTAATTC ACC AATTACCC AAAGAT ACC GT 15 6 0 
C TCCCT' T T 

1561 TTAAGATTTCGTTACGCTTCCAGTAGGGATGCACGAGTTA 1600 
C C G A 'TTCCC T C TA C 

1601 TAGTATTAACAGGAGCGGCATCCACAGGAGTGGGAGGCCA 1640 
CGCCCCATTCTCTA 

• • • • 

1641 AGTTAGTGTAAAT ATGCCTCTTC AG AAAACTATGG AAATA 1680 
CTCC G C AC G G C 

1681 GGGGAGAACTTAACATCTAGAACATTTAGATATACCGATT 1720 
C G CGCC C C 

1721 TTAGTAATCCTTTTTCATTTAGAGCTAATCCAGATATAAT 1760 
CTC C CAGT CC T C C T C C 

1761 TGGGATAAGTGAACAACCTCTATTTGGTGCAGGTTCTATT 1800 
CTC C AT AGC C 

• • • 

1801 AGTAGCGGTGAACTTTATATAGATAAAATTGAAATT ATTC 184 0 
TCATCT C TGCTCG GC 

• • • • 

1841 TAGC AGATGCAAC ATTTG AAGCAGAATCTG ATTTAG AAAG 1860 
TCCTCCCGTG ACA CC T G 

■ • • * 

1881 AGCAC AAAAGGCGGTGAATGCCCTGTTT ACTTCTTCC AAT 1920 
C G T C C C CA 

1921 CAAATCGGGTTAAAAACCGATGTGACGGATTATCATATTG 1960 
GC T C G " TACTTC C 

• • • * 

1961 ATCAAGTATCCAATTTAGTGGATTGTTTATCAGATGAATT 2000 
C G C G CACC ACC TAGC G 

2001 TTGTCTGGATGAAAAGCG AGAATTGTCCGAGAAAGTC AAA 2040 
CCCCG TCC T 

2041 CATGCGAAGCGACTCAGTGATGAGCGGAATTTACTTCAAG 2080 
CC T CCA CCTG 

• . • • 

2081 ATCCAAACTTCAGAGGGATCAATAGAC AACCAGACCGTGG 2120 
CT C A AC C G G A 



FIGURE 14C 



222 



EP 0 385 962 B1 



2121 CTGGAGAGGAAGTACAGATATTACCATCCAAGGAGGAGAT 2160 
TGT CCGGC CC 

2161 GACGTATTCAAAGAGAAT7ACGTCACACTACCGGGTACCG 2200 
TG G C CCTCATT 

• • • • 

2201 TTGATGAGTGCT ATCC AACGTATTTATATC AGAAAATAG A 2240 
CC CTCCGC GC 

2241 TGAGTCGAAATTAAAAGCTTATACCCGTTATGAATTAAGA 2280 
C CC CTC AG CCT 

• • • * 

2281 GGGTATATCGAAGATAGTCAAGACTTAGAAATCTATTTGA 2320 
CC CC CT CC 

• • • • 

2321 TCCGTTACAATGCAAAACACGAAATAGTAAATGTGCCAGG 2360 
AG CG GCCG C 

• • • • 

2361 C ACGGGTTCCTT ATGGCCGCTTTC AGCCCAAATGCC AATC 2 40 0 
XT C C A T TCT C T 

2401 GGAAAGTGTGGAGAACCG AATCGATGCGCGCCACACCTTG 2440 
G G T CA T 

• « • • 

2441 AATGGAATCCTGATCTAGATTGTTCCTGCAGAGACGGGGA 2480 
G CTGCC GTC 

• • • • 

2481 AAAATGTGCACATCATTCCCATCATTTCACCTTGGATATT 2520 
GG CC T CT .CC 

• • • • 

2521 GATGTTGGATGTACAGACTTAAATGAGGACTTAGGTGTAT 2560 
G TCG CCAC 

• * • • 

2561 GGGTGATATTCAAGATTAAGACGCAAGATGGCCATGCAAG 2 600 
C C C C C A C 

• • • • 

2601 ACTAGGGAATCTAGAGTTTCTCGAAGAGAAACCATT ATT A 2 640 
T C C T GG C 

• • • * 

2 641 GGGGAAGCACTAGCTCGTGTGAAAAGAGCGGAGAAGAAGT 2 680 
T T C G A 

2 68 1 GGAGAGACAAACGAGAGAAACTGC AGTTGG AAACAAATAT 2720 
G T CG A G T C 

• • • • 

2721 TGTTTAT AAAGAGGC AAAAGAATC TGT AG ATGCTTT ATTT 27 60 
C CG C GCG GC 

• • • • 

2761 GTAAACTCTCAAT ATGAT AGATTACAAGTGGATACGAAC A 2800 
G C CAG G CC C C 

• • • • 

2801 TCGCCATGATTC ATGCGGCAGATAAACGCGTTCATAGAAT 2840 
CCC C TGCC 
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2841 CCGGGAAGCGTATCTGCCAGAGTTGTCTGTGATTCCAGGT 2880 
TTGTCT T C CT 

2881 GTCAATGCGGCCATTTTCGAAGAATTAGAGGGACGTATTT 2920 
GCT C GCT C 

• • • • 

2921 TT ACAGCGT ATTCCTTAT ATGATGCG AG AAATGTCATTAA 2960 
CATC GC C C C 

• • • • 

2961 AAATGGCGATTTC AATAATGGCTT ATTATGCTGGAACGTG 3000 
G C T C C C CAGC T 

• • • • 

3t) 0 1 AAAGGTCATGTAGATGTAGAAG AGC AAAAC AACC ACCGTT 3040 
GCGGAG TG 

• • • • 

3041 CGGTCCTTGT T ATCCC AG AATGGG AGGC AG AAGTGT C AC A 3080 
C GGGTG AT C 

• • • * 

3081 AGAGGTTCGTGTCTGTCC AGGTCGTGGCTAT ATCCTTCGT 3120 
* A A A C T C 

• * • • 

3121 GTCACAGCATATAAAGAGGGATATGGAGAGGGCTGCGTAA 3 1 60 
GCTCG CT T G 

3161 CG ATCC ATG AG ATCG AAG ACAAT ACAG ACG AACTG AAATT 3200 
C C GACC GTG 

3201 CAGCAACTGTGT AG AAG AGGAAGT AT ATCC AAACAACAC A 3240 
TC CC.GAAC C C 

• • • • 

3241 GTAACGTGTAAT AATTATACTGGGACTCAAGAAGAATATG 3280 
TTCCGCC TAG GC 

• • 

3281 AGGGTACGT ACACTTCTCGTAATCAAGG AT ATGACG AAGC 3320 
GA G C AGC CAG T CA 

3321 CTATGGTAATAACCCTTCCGTACCAGCTGATTACGCTTC A 3360 
TCC TCXXXXXXXXXXXX T T C T C C 

• • » • • 

3361 GTCT ATGAAG AAAAATCGTAT AC AG ATGGACGAAG AGAG A 3400 
G C G G C C CA C T» 

3401 ATCCTTGTGAATCTAACAGAGGCTATGGGGATTACACACC 3440 
C C G TC T CA C 

• • • • 

3441 ACTACCGGCTGGTT ATGT AAC AAAGG ATTT AGAGT ACTTC 3480 
TAT C TC GCT T 

• • * * 

3481 CCAGAG ACCGAT AAGGTATGGATTGAGATCGGAGAAACAG 3520 
T CAGC T C 

• • • « 

3521 AAGGAACATTCATCGTGGATAGCGTGGAATTACTCCTT&T 3560 
G C C GC T T G 
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1 AGATCTAG AGGTAATTGTT ATGAGTACTGTCGTGGTT AAG 4 0 
GATC 

. • • • 

41 GGAAACGTCAACGGTGGTGTACAACAACCTAGAAGGAGGA 80 
G T A 

81 GAAGGCAATCCCTTCGCAGGAGGGCTAACAGAGTACAGCC 120 

T A T 

. • • • 

121 AGTGGTTATGGTCACTGCTCCTGGCGAACCCAGGAGGAGG 160 

GC A A A 

* ♦ • ' 

161 AGACGCAGAAGAGGAGGCAATCGCAGGTCAAGAAGAACTG 200 
AG T A 

201 GAGTTCCCAGGGGAAGGGGCTCAAGCGAGACATTCGTGTT 240 
A AT 

241 TACAAAGGACAACCTCGTGGGCAACTCCCAAGGAAGTTTC 280 

281 ACCTTCGGACCAAGTGTATCAGACTGTCCAGCATTCAAGG 320 

T 

• • • • 

321 ATGGAATACTCAAGGCCTACCATGAGTACAAGATCACAAG 360 

T 

. . • * 

361 TATCCTTCTTCAGTTCGTCAGCGAGGCCTCTTCCACCTCA 400 
T G T 

* * 

401 C CAGGATCC ATCGCTTATG AGTTGG ACCCACATTGC AAAG 440 
C AT 

• • • * 

441 TATCATCCCTCCAGTCCTACGTCAACAAGTTCCAAATCAC 480 
T 

481 AAAGGGAGGAGCTAAGACCTATCAAGCTAGGATGATCAAC 520 
T T C T 

. . • • 

521 GGAGTAGAATGGCACGATTCATCTGAGGATCAGTGCAGGA 560 
T T A 

. • • • 

561 TACTTTGGAAAGGAAGTGGAAAATCTTCAGACCCAGCAGG 600 
C A G T T 

• • • * 

601 ATCTTTCAGAGTCACCATCAGAGTGGCTCTTCAAAACCCC 640 
T T A 

. . • • 

641 AAGTAATAGACTCCGGATCAGAGCCTGGTCCAAGCCCACA 680 

A T 
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681 ACCAACACCCACTCCAACTCCCCAAAAGCATGAGCGATTT 720 
721 ATTGCTTACGTCGGCATACCTATGCTGACCATTCAAGAAT 760 
7 61 TC 762 
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